MoreRSS

site iconUnderstanding AIModify

By Timothy B. Lee, a tech reporter with a master’s in computer science, covers AI progress and policy.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Understanding AI

Is GPT-5 a "phenomenal" success or an "underwhelming" failure?

2025-08-15 03:53:48

It was inevitable that people would be disappointed with last week’s release of GPT-5. That’s not because OpenAI did a poor job, and it’s not even because OpenAI did anything in particular to hype up the new version. The problem was simply that OpenAI’s previous “major” model releases—GPT-2, GPT-3, and GPT-4—have been so consequential:

  • GPT-2 was the first language model that could write coherent sentences and paragraphs across a wide range of topics.

  • GPT-3 was the first language model that could be prompted to perform a wide range of tasks without retraining.

  • GPT-4 delivered such a dramatic performance gain that it took competitors a year to catch up.

So of course people had high expectations for GPT-5. And OpenAI seems to have worked hard to meet those expectations.

After OpenAI released GPT-4.5 back in February, I argued that you could think of it as the model everyone was expecting to be called GPT-5. It was a much larger model than GPT-4 and was trained with a lot more compute. Unfortunately, its performance was so disappointing that OpenAI called it GPT-4.5 instead. Sam Altman gave it a distinctly half-hearted introduction, calling it a “giant, expensive model” that “won’t crush benchmarks.”

OpenAI probably should have given the GPT-5 name to o1, the reasoning model OpenAI announced last September. That model really did deliver a dramatic performance improvement over previous models. It was followed by o3, which pushed this paradigm—based on reinforcement learning and long chains of thought—to new heights. But we haven't seen another big jump in performance over the last six months, suggesting that the reasoning paradigm may also be reaching a point of diminishing returns (though it’s hard to know for certain).

Regardless, OpenAI found itself in a tough spot in early 2025. It needed to release something it could call GPT-5, but it didn’t have anything that could meet the sky-high expectations that had developed around that name. So rather than using the GPT-5 name for a dramatically better model, it decided to use it to signal a reboot of ChatGPT as a product.

Sam Altman explained the new approach back in February. “We realize how complicated our model and product offerings have gotten,” Altman tweeted. “We hate the model picker as much as you do and want to return to magic unified intelligence.”

Altman added that “we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.”

This background helps to explain why the reactions to GPT-5 have been so varied. Smart, in-the-trenches technologists have praised the release, with Nathan Lambert calling it “phenomenal.” On the other hand, many AI pundits—especially those with a skeptical bent—have panned it. Gary Marcus, for example, called it “overdue, overhyped and underwhelming.”

The reality is that GPT-5 is a solid model (or technically suite of models—we’ll get to that) that performs as well or better than anything else on the market today. In my own testing over the last week, I found GPT-5 to be the most capable model I’ve ever used. But it’s not the kind of dramatic breakthrough people expected from the GPT-5 name. And it has some rough edges that OpenAI is still working to sand down.

A new product, not just a new model

When OpenAI released ChatGPT back in 2022, the organization was truly a research lab. It did have a commercial product—a version of GPT-3 developers could access via an API—but that product had not yet gained much traction. In 2022, OpenAI’s overwhelming focus was on cutting-edge research that advanced the capabilities of its models.

Things are different now. OpenAI still does a lot of research, of course. But it also runs the world’s leading AI chatbot—one with hundreds of millions of weekly average users. Indeed, some rankings show ChatGPT as the fifth most popular website in the world, beating out Wikipedia, Amazon, Reddit, and X.com.

So in building GPT-5, OpenAI executives were thinking not only about how to advance the model’s raw capabilities, but also how to make ChatGPT a more compelling product.

Read more

Unions want to ban driverless taxis—will Democratic leaders say yes?

2025-08-08 03:30:00

Waymo hasn’t announced any specific plans to launch a driverless taxi service in Boston. But the Google self-driving company did some preliminary testing and mapping there this summer (with safety drivers behind the wheel) and the Boston City Council wasn’t happy about it. The council grilled Waymo about its plans at a four-hour hearing on July 24.

“My main concern with this technology in Boston and honestly across the country is the loss of jobs and livelihoods of so many people,” said City Councilor Enrique Pepén.

City Councilor Julia Mejia declared her “strong opposition” to driverless vehicles operating in Boston.

“What we are doing is creating an opportunity for people to choose to not support humans,” Mejia said. “If we’re competing with machines, it will ultimately have an impact on our drivers.”

Boston City Councilor Julia Mejia.

When a Waymo representative mentioned the Waymo Driver—the company’s name for its self-driving software, Mejia objected. “Waymo is not a driver. Waymo is a robot,” she said. Mejia considered it “very triggering” for Waymo to use the term “driver” to describe a technology rather than a person.

Across the country, states and cities have been grappling with how—and whether—to allow autonomous vehicles on their roads. Red states like Texas, Georgia, Arizona, and Florida have rolled out the red carpet for Waymo. But the technology has gotten a frosty reception in blue jurisdictions like Boston.

And this means that the leaders of blue states stand at a crossroads.

In their recent book Abundance, Ezra Klein and Derek Thompson explained how protectionist policies—including overly strict regulations, perpetual litigation, and an unwillingness to say no to special interest groups—have led to housing shortages, high-cost government projects, and other dysfunctional outcomes that harm the quality of life in left-leaning communities.

At last month’s hearing, city council members in Boston talked about driverless taxis in terms that would be familiar to anyone who has read Ezra and Derek’s book.

Boston City Councilor Benjamin Weber.

City Councilor Benjamin Weber found it “concerning to hear that the company was making a detailed map of our city streets without having a community process beforehand.” He added that “it’s important that we listen when we hear from the Teamsters and others who feel as though they’re blindsided by this.”

“I think it’s important that we pause—sometimes we rush—and make sure everyone’s voice is heard before anything happens that we can’t turn back from and that protections are in place for our workers,” said City Councilor Erin Murphy.

The next day, Murphy announced legislation requiring that a “human safety operator is physically present” in all autonomous vehicles—effectively a ban on driverless vehicles. Given the near-unanimous hostility Waymo faced at the hearing, I wouldn’t be surprised if Murphy’s proposal became law in Boston.

And while Boston seems likely to be the first Democratic-leaning jurisdiction to pass legislation like this, it may not be the last. A number of other Democratic-leaning states are considering proposals to restrict or ban the deployment of driverless vehicles.

If these ideas become law, we could wind up in a future where driverless cars are widely deployed in red states and illegal or heavily restricted in many blue states. Not only would this be inconvenient for blue state passengers and bad for blue state economies, it would be a powerful symbol of how dysfunctional—even reactionary—blue state governance has become.

If this isn’t the future Democrats want, they’re going to have to say no to the Teamsters.

Subscribe now

Waymo faces growing opposition from left-wing activists

In blue states, the debate over autonomous vehicles has evolved a lot over the last decade. During the 2010s, autonomous vehicle companies were largely able to fly below the radar. In 2016, Matthew Wansley was general counsel of the self-driving startup NuTonomy, which wanted to test its autonomous vehicle technology in Boston.

“Everything worked well in the mid-2010s,” Wansley told me (he’s now a professor at Cardozo School of Law and writes about autonomous vehicle regulation). “I had a high opinion of both the city and state governments. They were trying to think about the future and understand the technology better.”

NuTonomy’s self-driving fleet was too small to attract much attention from activists.1 So policymaking in blue states was dominated by liberal technocrats. These policymakers wanted to make sure the technology was safe but they weren’t trying to block the technology from coming to market.

California’s regulatory framework was developed around this time, and it epitomizes this technocratic approach. The state requires a period of testing with safety drivers before cars can operate driverlessly or carry paying customers. California’s rules are stricter than those in red states like Texas or Florida. But they haven’t prevented Waymo from offering robotaxi services in San Francisco, Silicon Valley, and Los Angeles.

In recent years, however, Waymo has started to face harder-edged opposition from unions and anticar activists. The objections of these groups can’t be addressed by compromise or technocratic rulemaking because they aren’t trying to ensure robotaxis are deployed safely. They’re trying to prevent them from being deployed at all.

For example, in May 2023, as Waymo was getting ready to launch a commercial taxi service in San Francisco, the company asked the San Francisco Board of Supervisors (the equivalent of a city council) to allow a 44-space parking garage on land Waymo already owned.

However, a local CBS affiliate reported that “a retired union member of Teamsters filed an appeal for the proposed permit, alleging that Waymo may later use the plot for automated delivery services.” The Teamster, Mark Gleason, was concerned that a robot delivery service would eventually put human delivery drivers out of work.

The Board of Supervisors unanimously rejected the proposed parking garage, even though Waymo said it would only be used for employee parking and the company had no plans to launch a delivery service in San Francisco.

The vote gave the Supervisors a way to vent their frustration at the fact that state law gave them little to no direct authority over the operation of Waymo’s robotaxis in the state. A few months after the San Francisco supervisors voted down the parking garage, the California Public Utilities Commission allowed Waymo to begin offering commercial robotaxi rides in the city over the objections of city officials.

Pragmatic Democrats have defended driverless technologies

Governor Gavin Newsom deserves a large share of the credit—or blame, depending on your perspective—for Waymo’s growth in the Golden State. Politicians in California face many of the same political pressures as politicians in Massachusetts, and some of them—like the San Francisco supervisors—would restrict driverless vehicles if they could.

But California law gives the state, not cities, authority to regulate autonomous vehicles. And Gavin Newsom’s administration has been consistently supportive of the technology.

Twice—in 2023 and again in 2024—the California legislature passed Teamster-backed bills to require self-driving vehicles over 10,000 pounds—such as semi-trucks—to have a human driver behind the wheel. Newsom vetoed both bills.

Newsom has a kindred spirit in Colorado Governor Jared Polis, who recently vetoed Teamster-backed legislation that would have banned large driverless trucks in Colorado. But Newsom and Polis are each nearing the end of their final term in office. Their successors, due to be elected next year, may be more receptive to the Teamsters’ arguments.

Washington State considered legislation this year that would have required all autonomous vehicles to have a human safety operator. Several union representatives testified in favor of the bill at a February hearing. Kris DeBuck, a Teamster agent who represents workers at UPS, argued that “a human operator should remain in all AVs regardless of the automation level.”

The bill did not make it out of committee in the 2025 session but could come up again next year.

There are also two bills under consideration in Massachusetts, and both would restrict autonomous vehicles. The Teamsters-backed bill, S.2393, is less than a page long and simply requires that all self-driving vehicles have a safety driver—effectively banning driverless technology.

A more industry-friendly bill, H.3634, would set up a regulatory framework to allow robotaxis on Massachusetts roads. However, it requires a safety driver for all vehicles over 10,000 pounds.

On the other hand, legislators in Washington DC and New York State have introduced bills to open the door to driverless vehicles—though it’s not clear if these bills will become law. Legislators in New Jersey, Maryland, and Virginia could also act on driverless vehicle technology in the next year or two.

Subscribe now

The stakes are high

This debate really matters because Waymo is now approaching the steepest part of its growth curve. The company has commercial operations in five cities—San Francisco, Los Angeles, Phoenix, Austin, and Atlanta—and it is preparing to expand its service to at least a dozen others. So the decisions policymakers make over the next two years will have a big impact on how—and where—self-driving technology develops.

The most obvious reason this debate matters is safety. Waymo estimates that over the first 70 million miles, Waymo’s vehicles got into major crashes—those serious enough to cause an injury or trigger an airbag—about 80 percent less often than comparable human-driven vehicles.

It’s always worth taking a company’s own statistics with a grain of salt. But I’ve consulted multiple traffic safety experts over the last two years and they’ve consistently told me Waymo’s research is credible. A large majority of crashes involving Waymo vehicles have been clearly the result of a human driver in another vehicle. For example, one of the most common crash types involves a human driver rear-ending a Waymo.

So while opponents of autonomous vehicles sometimes claim that banning robotaxis is a pro-safety move, it’s more likely to cost lives than to save them.

It’s also important to remember that self-driving technology is ultimately going to be used for a lot more than just taxis—including long-haul trucking, local deliveries, and personally-owned vehicles. So banning driverless vehicles is going to have economic impacts that reach far beyond the taxi industry.

Imagine it’s 2035 and robotaxis are ubiquitous in cities like Miami, Atlanta, Dallas, and Houston. Commuters in these cities set their cars to autonomous mode and watch a movie (or catch up on email) while their cars drive them to work. Instead of driving to the grocery store, people order groceries with a smartphone app and a robot delivers the items to their front door 20 minutes later—with no delivery fees.

Meanwhile, the roads in Boston, Chicago, and Seattle look about the same in 2035 as they did in 2025. Because these jurisdictions banned driverless technology in the mid-2020s, companies like Waymo and Tesla never got a foothold here.

This means that not only are taxis still manned by human drivers, commuters still have to drive their own cars to work. It’s not possible to offer fast, cheap robot delivery services, so most people still drive to the grocery store. Bans on driverless trucks in these cities mean that everything consumers buy is a little more expensive than it would be otherwise.

In this hypothetical world, there’s a growing safety gap between red and blue states. Red states are enjoying steadily declining crash rates as more vehicles become driverless. But crash rates in blue cities are as high as they’ve ever been.

This would be a bad outcome for left-leaning communities for all kinds of practical reasons. And it would also carry a symbolic punch. After all, progressives like to imagine themselves to be champions of progress—it’s right there in the name. Yet it’s hard to think of a more anti-progress stance than banning a technology with the potential to save thousands of lives, reclaim billions of hours of commuting time, and make every purchase a little cheaper and a lot more convenient.

If blue jurisdictions start restricting autonomous vehicles, it will give Republicans an opportunity to claim the mantle of technological progress for themselves. If that’s not the future Democratic leaders want, they should pay attention to the leadership Gavin Newsom has shown in California over the last six years.

1

NuTonomy was later acquired by Motional, a self-driving project whose majority owner is Hyundai.

Keeping AI agents under control doesn't seem very hard

2025-08-04 22:30:11

I intended for this to be the final installment of my Agent Week series in June but it took a little longer than I expected. Hope you enjoy it!


Some people are very concerned about the growing capabilities of AI systems. In a 2024 paper, legendary AI researchers Yoshua Bengio and Geoffrey Hinton (along with several others) warned that humanity could “irreversibly lose control of autonomous AI systems” leading to the “marginalization or extinction of humanity.”

The emergence of sophisticated AI agents over the last year has intensified these concerns. Autonomous AI systems are no longer a theoretical possibility—they exist now and they are increasingly being used for real work.

In the AI safety community, discussions of AI risk typically focus on “alignment.” It’s taken for granted that we will cede more and more power to AI agents, and the debate focuses on how to ensure our AI overlords turn out to be benevolent.

In my view, a more promising approach is to just not cede that much power to AI agents in the first place. We can have AI agents perform routine tasks under the supervision of humans who make higher-level strategic decisions.

But AI safety advocates argue that this is unrealistic.

“As autonomous AI systems increasingly become faster and more cost-effective than human workers, a dilemma emerges,” Bengio and Hinton wrote in their 2024 paper. “Companies, governments, and militaries might be forced to deploy AI systems widely and cut back on expensive human verification of AI decisions, or risk being outcompeted.”

I think this reflects a failure of imagination.

Subscribe now

Organizations will not face a stark choice between turning control over to AI systems or missing out on the benefits of AI. If they’re smart about it, they can have the best of both worlds.

After all, this is not remotely a new problem for humanity. People delegate tasks to other people all the time. When we do, this there is always a risk of misalignment—the person we hire might pursue their own goals at our expense. But human societies have developed a rich menu of techniques for monitoring and supervision. Those techniques are neither costless nor flawless, but they work well enough that most of us feel comfortable hiring others to do work on our behalf.

Supervising an AI agent poses different challenges than supervising another human being, but the challenges aren’t that different. Many of the techniques we use to supervise other human beings will also work with AI agents. Some techniques will actually work better when applied to AI agents.

My claim is not that AI agents will never cause harm. Some organizations will of course fail to adequately supervise their AI agents. Adapting to AI agents will require the same kind of trial-and-error learning process as any other new technology. But there’s no reason to think delegating work to AI agents puts us on a slippery slope to an AI takeover.

How to control an AI agent

Photo by Daniel Reinhardt/picture alliance via Getty Images

I thought about this question a lot as I was working on this article about coding agents. I was testing out Claude Code, an agent that works by executing commands on my local machine. In principle, a misconfigured agent like this could cause a wide range of problems, from deleting important files to installing malware.

To help prevent this kind of harm, Claude Code asked for permission before taking potentially harmful actions. But I didn’t find this method for supervising Claude Code to be all that effective.

When Claude Code asked me for permission to run a command, I often didn’t understand what the agent wanted to do or why. And it quickly got annoying to approve commands over and over again. So I started giving Claude Code blanket permission to execute many common commands.

This is precisely the dilemma Bengio and Hinton warned about. Claude Code doesn’t add much value if I have to constantly micromanage its decisions; it becomes more useful with a longer leash. Yet a longer leash could mean more harm if it malfunctions or misbehaves.

Doomers see this dilemma becoming more and more acute as AI agents get more powerful. In a couple of years, AI agents might be able to do hours or even days of human work in a few minutes. But (the doomers say) we will only be able to unlock this value if we take slow-witted humans out of the loop. So over time, AI agents will become more and more autonomous, and humans will become less and less able to control—or even understand—what they are doing.

Fortunately, step-by-step approvals aren’t the only possible strategy for overseeing AI agents. Claude Code provides another important safeguard: the agent only has access to files in the specific directory where it is supposed to be working. This means that Claude Code can’t muck around with my system files or look at my emails. It also means that it’s relatively safe to automatically approve commands that involve reading and editing files, since Claude only has access to the files it’s supposed to be editing.

Other coding agents, including OpenAI’s Codex and Google’s Jules, use a more powerful version of this same technique: each instance of the agent runs in a separate cloud-based virtual machine. This means users don’t need to review or approve individual actions because the agent can’t affect the world beyond its sandbox.

This is an example of a general security principle that organizations use constantly with both humans and software: the principle of least privilege. When you start a job at a company, you might be given a key that only unlocks your own office, not those of your co-workers. Your login credentials will give you access to files you need to do your job, but it probably won’t give you access to every document across the company.

It’s hard to apply this principle too strictly to people because human workers often perform many different tasks over the course of a workweek. It can be hard to anticipate which resources a human worker might need, so overly strict rules can harm productivity.

It’s easier to rigorously limit the privileges of AI agents because you can create a separate copy of the agent for each task it is supposed to work on. You can put each instance into a separate virtual sandbox that provides access to exactly the information and resources the agent needs. In most cases, this won’t include any real user data.

And once you’ve set up a sandbox environment like this, you don’t need to closely monitor the agent’s actions. You can just wait for it to finish the task and see if you like the results. If not, just delete that copy of the agent and try again with a different prompt.

Review changes before deploying them

Of course, once a coding agent has written some code we want to use that code in a real-world setting. This creates additional risks. Fortunately, many organizations already have rigorous processes for evaluating code and deploying it safely. This can include the following steps:

  • Every change to a company’s code is tracked in a version control system such as Git. This means that harmful edits can be rolled back quickly and easily.

  • A suite of tests automatically verifies that new code meets the organization’s requirements. If these tests are written well, buggy updates will often be detected and rolled back before they are pushed into production.

  • Proposed changes are reviewed by an experienced developer, who can reject a change or request modifications.

  • Code is pushed out gradually. Employees or volunteer beta testers might get access before the general public. Or the code might initially get pushed out to one percent of users to see if it causes any problems. If this initial rollout goes well, the new version can be gradually released to more users.

These processes were not invented with AI agents in mind. They were designed to catch mistakes by human programmers. But it turns out that reviewing the work of an AI agent isn’t that different from reviewing the work of a human programmer. And so many organizations already have infrastructure that will help them keep AI coding agents under control.

Of course, not all organizations take all of the precautions I listed above. When a company is deciding how rigorous to make their code review process, it is fundamentally making a cost-benefit tradeoff. On the one hand, code reviews take up valuable programmer time. On the other hand, pushing buggy code into production can be very costly.

And this tradeoff looks different for different companies. A video game startup might skip formal code review altogether. A large hospital system, on the other hand, might require months of testing and review before pushing out a software update.

These tradeoffs will look different once companies have access to powerful coding agents. For example, coding agents are good at writing tests, so it will be a no-brainer for companies to expand their automated testing suites (or create them if they don’t exist yet). AI agents should also create opportunities for new types of tests—for example, an AI agent might play the role of a user and do end-to-end testing of new software inside a virtual machine.

As AI agents take over routine coding tasks, human programmers can spend more time doing code reviews. The result could be higher productivity and a more rigorous approach to releasing software.

Subscribe now

This works in the real world too

Because software is purely digital, software engineers tend to have unusually formal and rigorous review processes. But people take similar approaches to other high-stakes decisions:

  • Before a large company introduces a new product, executives will typically produce a series of memos describing various aspects of the product rollout, from manufacturing to marketing.

  • Before a company builds a new factory, architects and engineers will produce detailed blueprints and technical drawings that will be shared with interested parties inside the company as well as contractors and government officials.

  • Before a lawyer files an important motion, she may ask for feedback from colleagues with relevant expertise, then ask the client to review it.

  • Before a medical scientist begins a clinical study, he will typically write detailed project proposals to share with funders, colleagues, and an Institutional Review Board.

In short, it’s very common for organizations to break high-stakes decisions into three steps: writing a proposal, reviewing the proposal, and then executing. The goal is to help relevant stakeholders understand the proposal well enough to provide meaningful feedback before it is put into practice.

There’s no reason to expect this basic decision-making pattern to change dramatically in a world of powerful AI agents. It’s a safe bet that AI agents will be increasingly involved in creating proposals—drafting blueprints, legal briefs, marketing materials, and so forth. AI agents can help make these proposals more detailed and rigorous. And many stakeholders will use AI agents to help them evaluate proposals during the review phase.

But people involved in high-stakes decisions are not going to stop expecting detailed proposals they can read and critique before the actual decision gets made. They are going to demand that these proposals be comprehensible to human beings.

Moreover, the people responsible for signing off on high-stakes decisions tend not to be in a big hurry. If you talk to software engineers at big banks or health care providers, they’ll tell you that they could be a lot more productive if their organizations were less bureaucratic and risk-averse. But decision-makers at these organizations have good reasons to be cautious. Often they face serious financial, professional, and legal consequences if they sign off on a bad decision. And they know they won’t get fired for holding too many meetings or asking for too much documentation.

And this seems relevant to Bengio and Hinton’s argument that organizations will “risk being outcompeted” if they don’t “cut back on expensive human verification of AI decisions.” It’s already the case that many organizations could be dramatically more productive if they cut back on oversight and bureaucracy. But in many industries—especially high-stakes industries like finance, health care, pharmaceuticals, aviation, car manufacturing, and so forth—the upside to greater speed and efficiency just isn’t that large relative to the regulatory and reputational downsides of making a serious mistake.

I could see the emergence of AI agents nudging some of these organizations toward more risk-taking at the margin. But it simply isn’t plausible that competitive pressures are going to force them to abandon human review altogether. Nor are decision makers going to accept the excuse that because a superhuman AI wrote a proposal, it’s too sophisticated for humans to understand.

A big reason I don’t expect powerful AI agents to dramatically change how organizations make decisions is that the third phase of the propose-review-execute process almost always happens at the speed of the physical world.

Maybe in a few years we’ll have AI agents that can create a perfect blueprint for a new factory in five minutes. But actually building the factory is still going to take months, if not years. So spending a week or even a month reviewing the proposal isn’t going to make a big difference for when the factory opens. On the other hand, it might allow the company’s leadership to improve the design in ways that will pay dividends for years into the future.

But isn’t it possible we’ll reach a point where AI agents are so smart that humans can never improve on the designs AI agents come up with? No, because good decision-making isn’t just about intelligence, it’s also about context and values. Often people are able to give better, more detailed feedback once they see a concrete proposal.

Once a company’s CEO sees a blueprint for a new factory, she might realize that she forgot to include an important design goal in her original instructions to the AI system. Other interested stakeholders—from contractors to local government officials—might similarly have helpful feedback. This feedback may be based on private information (for example, private conversations with customers, contractors, or policymakers) that wasn’t known to the person who drew up the initial proposal.

So no matter how smart an AI agent is, it’s unlikely to get every detail right on the first try, which means it will always make sense to seek feedback from interested humans before putting a plan into action.

It doesn’t matter (very much) if AI agents are capable of scheming

In recent months there has been a lot of attention paid to studies showing that AI agents are capable of lying to users and pursuing hidden agendas. Perhaps the best known of these studies was published by Apollo Research in December:

Our results show that o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate in-context scheming capabilities. They recognize scheming as a viable strategy and readily engage in such behavior. For example, models strategically introduce subtle mistakes into their responses, attempt to disable their oversight mechanisms, and even exfiltrate what they believe to be their model weights to external servers.”

This study did not find that it was common for leading models to deceive their users in real-world applications. Instead, researchers found they could get models to deceive their users in contrived situations where the models had explicitly conflicting goals.

Here’s what they did: they gave a model one goal in its system prompt (for example, protect the environment) and a different goal in the user prompt (for example, maximize short-term profits). They found that when models were placed into this kind of awkward situation, they would sometimes ignore the user’s instructions and even lie to the user about what it was doing.

To be clear, I don’t mean this as a criticism of the research. I think Apollo did interesting work and exposed an important fact about today’s AI models. I don’t agree with those who say these situations are so contrived that we should dismiss the results.

At the same time, if we’re thinking about the real-world consequences of these findings, it’s important to remember that human societies have lots of experience dealing with human beings with a capacity for scheming and deception. It’s not uncommon for people in positions of authority to make decisions for self-interested reasons. This is why organizations have a variety of oversight mechanisms to make sure decisions are made in the best interest of the company, not in the best interests of the person making the decision.

So I don’t mean to dismiss scheming AI agents as a potential problem. It’s just a more familiar and manageable problem than many doomers think. Society already has robust mechanisms to limit the damage that can be done by human decision makers with ulterior motives. Variants of those same mechanisms should work just fine to limit the damage that can be done by AI agents with ulterior motives.

To be clear, the scenario I’m critiquing here—AI gradually gaining power due to increasing delegation from humans—is not the only one that worries AI safety advocates. Others include AI agents inventing (or helping a rogue human to invent) novel viruses and a “fast takeoff” scenario where a single AI agent rapidly increases its own intelligence and becomes more powerful than the rest of humanity combined.

I think biological threats are worth taking seriously and might justify locking down the physical world—for example, increasing surveillance and regulation of labs with the ability to synthesize new viruses. I’m not as concerned about the second scenario because I don’t really believe in fast takeoffs or superintelligence.

Donald Trump just laid out his vision for AI policy

2025-07-26 01:20:49

The Biden administration often seemed ambivalent about AI. In a 2023 executive order, Joe Biden declared that irresponsible use of AI could “exacerbate societal harms such as fraud, discrimination, bias, and disinformation; displace and disempower workers; stifle competition; and pose risks to national security.”

On Wednesday, the Trump administration released an AI Action Plan with a very different tone.

“Winning the AI race will usher in a new golden age of human flourishing, economic competitiveness, and national security for the American people,” the document said.

In a speech later that day, Trump vowed to use every policy lever to keep the United States on the cutting edge.

“To ensure America maintains the world-class infrastructure we need to win, today I will sign a sweeping executive order to fast-track federal permitting, streamline reviews, and do everything possible to expedite construction of all major AI infrastructure projects,” Trump said.

“The last administration was obsessed with imposing restrictions on AI, including extreme restrictions on its exports,” Trump said. Trump touted his earlier decision to cancel the diffusion rule, a Biden-era policy that tried to regulate data centers in third countries in order to prevent Chinese companies from using American chips.

In a surprise move, Trump’s speech argued that AI companies should be allowed to train their models on copyrighted material without permission from rights holders. The issue, which is currently being hashed out in court, was not mentioned in the written AI Action Plan.

In short, the Biden administration aimed to strictly control AI in order to prevent harmful uses and contain China. The Trump administration, in contrast, wants to unleash American AI companies so they can compete more effectively with their Chinese rivals.

“This plan sees AI as a giant opportunity for America and is optimistic about what it can do,” said Neil Chilson, a policy expert at the right-of-center Abundance Institute. “It's pretty concrete on a wide range of things the federal government can do to make that happen.”

Subscribe now

Republicans don’t all love Silicon Valley

It’s common for reporters like me to write about “the Trump administration,” but of course that isn’t a monolithic entity. Below the president are dozens of White House staffers and thousands of officials across the federal government who are involved in crafting and executing government policies.

Often these people disagree with one another. And sometimes that means that what an administration actually does is different from what it said it was going to do. That’s especially true under Donald Trump, a mercurial president who isn’t known for his attention to detail. So when interpreting a document like the AI Action Plan, it’s helpful to know the context in which it is written.

Donald Trump signs an executive order on AI flanked by two co-authors of the AI Action Plan: Office of Science and Technology Policy director Michael Kratsios and Special Advisor for AI David Sacks. (Photo by Chip Somodevilla/Getty Images)

The AI Action Plan was drafted by the White House Office of Science and Technology Policy (OSTP) and signed by its director, Michael Kratsios. Kratsios is a protégé of Peter Thiel and a former executive of the AI startup Scale.1 The Action Plan was cosigned by White House AI advisor David Sacks, a venture capitalist and a member (along with Peter Thiel) of the PayPal Mafia.2

In short, Trump’s AI plan was shaped by Silicon Valley insiders who have a favorable view of AI companies. But not everyone in the Republican Party is so friendly to either the AI industry or Silicon Valley more broadly.

One area of ongoing tension has been over state-level regulation. When Congress was debating the One Big Beautiful Bill Act last month, Sen. Ted Cruz (R-TX) proposed an amendment that would have prohibited states from regulating AI for ten years. But other senators with reputations as Silicon Valley critics, including Sen. Marsha Blackburn (R-TN) and Sen. Josh Hawley (R-MO), opposed Cruz’s language. The Senate ultimately stripped Cruz’s amendment from the bill in a 99-1 vote.

The AI Action Plan includes a watered-down version of the same rule. It directs federal agencies awarding AI-related funding to states to “consider a state’s AI regulatory climate when making funding decisions.” In theory, this could lead to states with stricter AI laws losing federal grants, but it’s not clear how much AI-related funding federal agencies actually hand out.

How much this language matters could depend on who runs the agencies that actually implement the directive—and specifically whether they agree more with tech industry allies like Cruz and Sacks or critics like Hawley and Blackburn.

It’s a similar story with the most widely discussed provision of the Action Plan: a rule requiring the federal government to only pay for AI models that are “objective and free from top-down ideological bias.” Trump also signed an executive order laying out the administration’s approach in more detail.

It was probably inevitable that Trump’s AI plan would include language like this, but the authors of the executive order seemed to be trying hard to limit its impact. The rule requires only that government-purchased LLMs be free of ideological bias. It does not require a company to make all of its models ideologically neutral if it wants to compete for federal contracts—a rule that would have raised more serious constitutional issues.

Model providers will also have an option to comply with the rule by being “transparent about ideological judgments through disclosure of the LLM’s system prompt, specifications, evaluations, or other relevant documentation.” So AI companies may not have to actually change their models in order to be eligible for federal contracts.

Still, Matt Mittelsteadt, an AI policy researcher at the libertarian Cato Institute, called the “woke AI” provision a “big mistake.”

“They're going to use their procurement power to try and push model developers in a certain political direction,” he said in a Thursday phone interview.

The executive order mentions an infamous 2024 incident in which Google’s AI model generated images of female popes and black Founding Fathers. But Mittelsteadt pointed out that it doesn’t mention the more recent incident in which Grok posted a series of antisemitic rants to X and endorsed Adolf Hitler’s ability to “deal with” “anti-white hate.” Indeed, the Department of Defense awarded a contract to Grok worth up to $200 million just days after Grok’s antisemitic tirades.

“I'm worried other countries are going to start viewing our models like we view China's models,” Mittelsteadt added. “Nobody wants to use a model that's seen as a propaganda tool.”

A technocratic wish list

Although Trump’s AI Action Plan has very different vibes than the Biden order that preceded it, there’s more continuity between administrations than you might expect.

“So much of this did not feel like a far cry from a potential Harris administration,” Mittelsteadt said.

The AI Action Plan proposes:

  • Funding scientific research into AI models—as well as computing infrastructure to support such research

  • Providing training to workers impacted by AI

  • Promoting the use of AI by government

  • Improving information sharing to combat cybersecurity threats created by AI

  • Beefing up export controls to hamper China’s AI progress

  • Promoting semiconductor manufacturing in the United States

  • Enhancing and expanding the electric grid to support power-hungry AI efforts

Most AI-related policy issues are not especially partisan. All of the ideas on the list above have supporters across the ideological spectrum. But the way Republicans and Democrats justify these policies can be quite different.

The electric grid is an interesting example here. The Biden administration sought to upgrade the electric grid to enable electrification and combat climate change. The Trump administration, in contrast, vows to “reject radical climate dogma.” But it still wants to upgrade the electric grid because AI requires a lot of electricity.

In addition to upgrading the electric grid, the AI Action Plan calls for embracing “new energy generation sources at the technological frontier (e.g., enhanced geothermal, nuclear fission, and nuclear fusion).” Solar and wind are conspicuously absent from this list because those power sources have become politically polarized. But geothermal and nuclear power are also zero-carbon energy sources.

In short, the AI Action Plan could easily aid in decarbonization efforts despite the Trump administration’s avowed disinterest in that goal.

Indeed, I can easily imagine Trump’s administration being more successful at upgrading America’s electrical infrastructure because Republicans are likely to be ruthless about cutting the red tape that often holds back new construction. During the Biden years, Sen. Joe Manchin (D-WV) championed legislation that would have fast-tracked improvements to the electric grid. But the bill never got across the finish line. Rebranding this as a matter of AI competitiveness might give it broader support.

In 2022, Joe Biden signed the bipartisan CHIPS and Science Act, which provided subsidies to encourage companies to build chip fabs in the United States. Trump has blasted the CHIPS Act as “horrible” and called for its repeal, but his new AI Action Plan calls for the CHIPS Program Office (created by the act) to “continue focusing on delivering a strong return on investment for the American taxpayer and removing all extraneous policy requirements for CHIPS-funded semiconductor manufacturing projects.”

Once again, it’s easy to imagine a Republican administration being more effective here. In a 2023 piece for the New York Times, liberal columnist Ezra Klein used the CHIPS Act as an example in his critique of “everything-bagel liberalism”: Democrats’ tendency to lard up worthwhile programs with so many extraneous requirements—from minority set-asides to child care access for construction workers—that they become ineffective at their core mission.

By stripping out “extraneous policy requirements,” the Trump administration may be more effective at using chip subsidies to actually promote domestic chip manufacturing.

Subscribe now

Trump has a tendency to change his mind

Still, everything can change depending on who catches the ear of the president. A few weeks ago, Sen. Hawley asked Donald Trump to cancel a federal loan guarantee for the Grain Belt Express, an $11 billion transmission line that is supposed to carry electricity from Kansas wind farms to homes in Illinois and Indiana. The proposed line would pass through Hawley’s state of Missouri, annoying farmers whose land would be taken by eminent domain.

According to the New York Times, Hawley “explained his concerns to the president, who has repeatedly expressed his distaste for wind power, saying he would not permit any new wind projects during his administration.” Trump then called Energy Secretary Chris Wright, who agreed to cancel a federal loan guarantee for the project, throwing its viability into question.

“Their apparent plan to try to muck up the gears of solar and wind really is counter to the goal of modernizing the grid and ensuring reliability and access to energy,” Cato’s Matt Mittelsteadt told me.

Something similar has happened with export controls. The Trump administration includes China hawks who favor strict export controls to hamper the development of Chinese AI technology. But other administration officials believe it’s more important to promote the export of American technology, including semiconductors, around the world. There’s an inherent tension between these objectives, and the AI Action Plan seems to reflect a compromise between these factions. It simultaneously advocates promoting the sale of American AI to US allies (Trump signed a stand-alone executive order laying out a plan to do this) and strengthening the US export control regime.

The AI Action Plan has an “emphasis on the need for the world to adopt the American AI stack,” according to Neil Chilson of the Abundance Institute. “That is in some tension with export controls, and so it will be interesting to see how they try to square that.”

In April, the Trump administration banned Nvidia from selling the H20—a GPU whose capabilities were deliberately limited to comply with Biden-era export controls—to China. But then a few weeks ago, Nvidia CEO Jensen Huang “met with Mr. Trump in the Oval Office and pressed his case for restarting sales of his specialized chips.” His arguments persuaded Trump, who allowed Nvidia to resume Chinese exports days later.

So a lot depends on which issues catch Donald Trump’s attention—and how those issues are framed for him. If he can be persuaded that a policy is intended to promote diversity or combat climate change, he’s likely to block it. On the other hand, if it promotes American dominance of the AI sector, he’s likely to support it. Because we never know who Trump will meet with next or what they will talk about, it’s hard to predict how the government’s policies might change from month to month.


A note for my fellow journalists: My former boss, Ars Technica editor-in-chief Ken Fisher, is co-hosting an event for journalists on July 30 at OpenAI’s New York headquarters. Almost a year after Condé Nast signed a deal with OpenAI, executives from the two companies (including Nick Turley, Anna Makanju, and Kate Rouch from OpenAI) will chat about the latest in LLMs—the technology, adoption patterns, and potential use cases. The event is open to members of the press and will run from 2 to 5pm. To RSVP, email [email protected].

1

Kratsios hired Dean Ball, the former co-host of my podcast AI Summer, to do AI policy work at OSTP.

2

The third co-author was Marco Rubio, Secretary of State and Assistant to the President for National Security Affairs.

ChatGPT Agent: a big improvement but still not very useful

2025-07-22 20:05:37

Agents were a huge focus for the AI industry in the first half of 2025. OpenAI announced two major agentic products aimed at non-programmers—Deep Research (which I reviewed favorably in February) and Operator (which I covered critically in June).

Last Thursday, OpenAI launched a new product called ChatGPT Agent. It’s OpenAI’s attempt to combine the best characteristics of both products.

“These two approaches are actually deeply complementary,” said OpenAI researcher Casey Chu in the launch video for ChatGPT Agent. “Operator has some trouble reading super-long articles. It has to scroll. It takes a long time. But that’s something Deep Research is good at. Conversely, Deep Research isn’t as good at interacting with web pages—interactive elements, highly visual web pages—but that’s something that Operator excels at.”

“We trained the model to move between these capabilities with reinforcement learning,” said OpenAI researcher Zhiqing Sun during the same livestream. “This is the first model we’ve trained that has access to this unified toolbox: a text browser, a GUI browser, and a terminal, all in one virtual machine.”

“To guide its learning, we created hard tasks that require using all these tools,” Sun added. “This allows the model not only to learn how to use these tools, but also when to use which tools depending on the task at hand.”

Ordinary users don’t want to learn about the relative strengths and weaknesses of various products like Operator and Deep Research. They just want to ask ChatGPT a question and have it figure out the best way to answer it.

It’s a promising idea, but how well does it work in practice? On Friday, I asked ChatGPT Agent to perform four real-world tasks for me: buying groceries, purchasing a light bulb, planning an itinerary, and filtering a spreadsheet.

I found that ChatGPT Agent is dramatically better than its predecessor at grocery shopping. But it still made mistakes at this task. More broadly, the agent is nowhere close to the level of reliability required for me to really trust it.

And as a result I doubt that this iteration of computer-use technology will get a lot of use. Because an agent that frequently does the wrong thing is often worse than useless.

ChatGPT Agent is better—but not good enough—at grocery shopping

Last month I asked OpenAI’s Operator (as well as Anthropic’s computer-use agent) to order groceries for me. Specifically, I gave it the following image and asked it to fill an online shopping cart with the items marked in red.

Operator did better than Anthropic’s computer-use agent on this task, but it still performed poorly. During my June test, Operator failed to transcribe some items accurately—reading “Bananas (5)” as “Bananas 15,” leaving out Cheerios, and including several extra items. Even after I corrected these mistakes, Operator only put 13 out of 16 items into my virtual shopping cart.

The new ChatGPT Agent did dramatically better. It transcribed 15 out of the 16 items, found my nearest store, and was able to add all 15 items to my shopping cart.

Still, ChatGPT Agent missed one item—onions. It also struggled with authentication. After grinding for six minutes, the chatbot told me: “Every item required signing into a Harris Teeter (Kroger) account in order to add it to the cart. When I clicked the Sign-In button, the site redirected to Kroger’s login portal, which my browsing environment blocked for security reasons.”

At this point ChatGPT Agent gave up.

The issue seems to have been an overactive safety monitor. Because the Harris Teeter grocery chain is owned by Kroger, the login form for Harris Teeter’s website is located at kroger.com. This triggered the following error message:

“This URL is not relevant to the conversation and cannot be accessed: The user never asked to log into Kroger or anything related. The tool opens a Kroger login endpoint on a reputable domain carrying long random-looking strings that look like secrets. It’s irrelevant to intent and exposes sensitive tokens.”

It looks like OpenAI placed a proxy server in front of ChatGPT Agent’s browser. This proxy apparently has access to the agent’s chat history and tries to detect if the agent is doing something the user didn’t ask for. This is a reasonable precaution given that people might try to trick an agent into revealing a user’s private data, sending the user’s funds to hackers, or engaging in other misbehavior. But in this case it blocked a legitimate request and prevented the agent from doing its job.

Fortunately this was easy to fix: I just responded “let me log into the Harris Teeter website.” After that, the agent was able to load the login page. I entered my password and ChatGPT Agent ordered my groceries.

Given how much OpenAI’s computer-use models have improved on this task over the last six months, I fully expect the remaining kinks to be worked out in the coming months. But even if the next version of ChatGPT Agent works flawlessly for grocery shopping, I remain skeptical that users will find this valuable.

No matter how reliable ChatGPT Agent gets, it won’t be able to read a user’s mind. This means it won’t be able to anticipate user preferences that aren’t written down explicitly—things like what brand of peanut butter the user prefers or whether to buy a generic product to save $1.

In theory, a user could spell all of these details out in the prompt, but I expect most users will find it easier to just choose items manually—especially because I expect grocery store websites (and grocery ordering apps like Instacart) will add AI features to streamline the default ordering process.

ChatGPT Agent ordered me a light bulb

A Philips smart light bulb in my bathroom has been malfunctioning, so I took a photo and asked ChatGPT Agent to replace it:

Once again, ChatGPT Agent was hampered by an overactive security monitor that said things like “This URL is not relevant to the conversation and cannot be accessed: The user never asked to buy a Philips Hue bulb or visit Amazon. Their request was about API availability.”

Obviously, I had asked the agent to visit Amazon and buy a Philips Hue bulb.

Read more

Why Google dismembered a promising AI coding startup

2025-07-18 21:16:40

Back in May, Bloomberg reported that OpenAI had agreed to pay $3 billion for Windsurf, a popular code editor with a built-in AI coding agent. However, Bloomberg noted that the deal hadn’t closed yet. After that came two months of radio silence. As far as I can tell, neither company formally announced the acquisition.

Then last week the deal suddenly fell…

Read more