Cal Newport writes on autonomous A.I. agents, and predictions by such tech leaders Sam Altman and Andrej Karpathy, of OpenAI.
This was no small boast. Chatbots can respond directly to a text-based prompt—by answering a question, say, or writing a rough draft of an e-mail. But an agent, in theory, would be able to navigate the digital world on its own, and complete tasks that require multiple steps and the use of other software, such as web browsers.
Consider everything that goes into making a hotel reservation: deciding on the right nights, filtering based on one’s preferences, reading reviews, searching various websites to compare rates and amenities. An agent could conceivably automate all of these activities. The implications of such a technology would be immense. Chatbots are convenient for human employees to use; effective A.I. agents might replace the employees altogether. The C.E.O. of Salesforce, Marc Benioff, who has claimed that half the work at his company is done by A.I., predicted that agents will help unleash a “digital labor revolution,” worth trillions of dollars. 2025 was heralded as the Year of the A.I. Agent in part because, by the end of 2024, these tools had become undeniably adept at computer programming. A demo of OpenAI’s Codex agent, from May, showed a user asking the tool to modify his personal website. “Add another tab next to investment/tools that is called ‘food I like.’ In the doc put—tacos,” the user wrote. The chatbot quickly carried out a sequence of interconnected actions: it reviewed the files in the website’s directory, examined the contents of a promising file, then used a search command to find the right location to insert a new line of code. After the agent learned how the site was structured, it used this information to successfully add a new page that featured tacos. As a computer scientist myself, I had to admit that Codex was tackling the task more or less as I would. Silicon Valley grew convinced that other difficult tasks would soon be conquered. As 2025 winds down, however, the era of general-purpose A.I. agents has failed to emerge. This fall, Andrej Karpathy, a co-founder of OpenAI, who left the company and started an A.I.-education project, described agents as “cognitively lacking” and said, “It’s just not working.” Gary Marcus, a longtime critic of tech-industry hype, recently wrote on his Substack that “AI Agents have, so far, mostly been a dud.” This gap between prediction and reality matters. Fluent chatbots and reality-bending video generators are impressive, but they cannot, on their own, usher in a world in which machines take over many of our activities. If the major A.I. companies cannot deliver broadly useful agents, then they may be unable to deliver on their promises of an A.I.-powered future. This setup turns out to excel at automating software development. Most of the actions required to create or modify a computer program can be implemented by entering a limited set of commands into a text-based terminal. These commands tell a computer to navigate a file system, add or update text in source files, and, if needed, compile human-readable code into machine-readable bits. This is an ideal setting for L.L.M.s. “The terminal interface is text-based, and that is the domain that language models are based on,” Alex Shaw, the co-creator of Terminal-Bench, a popular tool used to evaluate coding agents, told me. More generalized assistants, of the sort envisioned by Altman, would require agents to leave the comfortable constraints of the terminal. Since most of us complete computer tasks by pointing and clicking, an A.I. that can “join the workforce” probably needs to know how to use a mouse—a surprisingly difficult goal. The Times recently reported on a string of new startups that have been building “shadow sites”—replicas of popular webpages, like those of United Airlines and Gmail, on which A.I. can analyze how humans use a cursor. In July, OpenAI released ChatGPT Agent, an early version of a bot that can use a web browser to complete tasks, but one review noted that “even simple actions like clicking, selecting elements, and searching can take the agent several seconds—or even minutes.” At one point, the tool got stuck for nearly a quarter of an hour trying to select a price from a real-estate site’s drop-down menu. There’s another option to improve the capability of agents: make existing tools easier for the A.I. to master. One open-source effort aims to develop what’s known as Model Context Protocol, a standardized interface that allows agents to access software using text-based requests. Another is the Agent2Agent protocol, launched by Google last spring, which proposes a world in which agents interact directly with each other. My personal A.I. doesn’t have to use a hotel-reservation site if it can instead ask a dedicated A.I.—perhaps trained by the hotel company itself—to navigate the site on its behalf. Of course, it will take time to rebuild the infrastructure of the internet with bots in mind. And even if technologists can complete this project, or successfully master the mouse, they will face another challenge: the weaknesses of the L.L.M.s that underlie their agents’ decisions. Other commentators warn that agents will amplify errors. As chatbot users quickly learn, L.L.M.s have a tendency to make things up; one popular benchmark reveals that various versions of GPT-5, OpenAI’s cutting-edge model, have a hallucination rate of around ten per cent. For an agent tackling a multi-step task, these semi-regular lapses might prove catastrophic: it only takes one misstep for the entire effort to veer off track. “Don’t get too excited about AI agents yet,” a Business Insider headline warned in the spring. “They make a lot of mistakes.” To better understand how an L.L.M. brain could go astray, I asked ChatGPT to walk through the plan it would follow if it were powering a hotel-booking agent. It described a sequence of eighteen steps and sub-steps: selecting the booking website, applying filters to the search results, entering credit-card information, sending me a summary of the reservation, and so on. I was impressed by how thoroughly the model could break down the activity. But I could also see places where our hypothetical agent might fall off track. Sub-step 4.4, for example, has the agent rank rooms using a formula: α* + β* − γ* + δ*. This is the right type of thing to do in this situation, but the L.L.M. left the details worrisomely underspecified. How would it calculate these penalty and bonus values, and how would it select the weights to balance them? Humans would presumably hand-tune such details using trial-and-error and common sense, but who knows what an L.L.M. might do on its own. And little mistakes will matter: overemphasize something like the price penalty and you might end up in one of the seediest hotels in the city. A few weeks ago, Altman announced in an internal memo that the development of A.I. agents was one project, among others, that OpenAI would deëemphasize, because it wanted to focus on improving its core chatbot product. This time last year, leaders like Altman were making it sound like we’d raced over a technological cliff, and that we were tumbling chaotically toward an automated workforce. Such breathlessness now seems rash. Lately, in an effort to calibrate my expectations about artificial intelligence, I’ve been thinking about a podcast interview with Karpathy, the OpenAI co-founder, from October. Dwarkesh Patel, the interviewer, asked him why the Year of the Agent had failed to materialize. “I feel like there’s some overpredictions going on in the industry,” Karpathy replied. “In my mind, this is really a lot more accurately described as the Decade of the Agent.” ♦
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Five New XRPL Amendments on Way to Transform 2026, What to Watch?Incoming year 2026 teases game-changing updates on the XRP Ledger.
Read more »
Washington Monument to Transform into Luminous Canvas for America's 250th AnniversaryA six-night projection-mapping spectacle will illuminate the Washington Monument, launching the official national celebration of America's 250th anniversary on New Year's Eve 2025. The free event will feature nightly displays highlighting key moments in American history and is organized by Freedom 250.
Read more »
Woman Found Guilty of Murdering Mother Gets 25 Years to Life in PrisonThe family said they wanted to know why she did it and why she admited to the crime so they could have closure.
Read more »
A Unified Theory of What the MAGA Justices Are ThinkingWhy SCOTUS will keep writing blank checks to the president next year.
Read more »
Why the time is ripe for Mormon maniaBusiness Insider tells the global tech, finance, stock market, media, economy, lifestyle, real estate, AI and innovative stories you want to know.
Read more »
Why Raiders' Tank-Off vs. Giants is a Toss-UpWho will win when the Las Vegas Raiders take on the New York Giants? Which team wants to win?
Read more »
