Rss preview of Blog of Understanding AI

Waymo is finally ready for freeway service

2025-11-13 06:55:03

In May 2023, Waymo expanded its service area to cover a larger chunk of the Phoenix metro area. Soon afterwards YouTuber Lorraine decided to pit Waymo and Tesla against one another in a head-to-head race across this expanded territory.

By taking Interstate 10, the Tesla vehicle was able to complete the 21-mile trip in about 26 minutes. Waymo avoided freeways, which meant it took nearly 55 minutes — more than twice as long — to reach the same destination.

“I’m not sure why Waymo is restricted from using freeways, but as their map grows, freeway support will become necessary,” Lorraine said.

Waymo operates mainly in sprawling southern cities like Phoenix, Los Angeles, and Atlanta. In places like this, the inability to access freeways is a huge handicap. Even in San Francisco, the most urban place Waymo operates, many trips take longer because Waymo doesn’t use Interstate 280.

Now all that is about to change. On Wednesday, Waymo announced it would begin offering freeway service to customers in Phoenix, Los Angeles, and the San Francisco Bay Area. It’s a big technological leap that transforms Waymo’s service from a novelty to a serious rival to Uber and Lyft.

There are several other driverless taxi companies, including Tesla and Amazon’s Zoox. But none of them have begun offering driverless freeway service. Tesla still has a human safety operator in every commercial vehicle. Zoox offers driverless rides, but I haven’t seen any reports of Zoox providing freeway service to the general public.

Waymo says it will route customers onto freeways “when a freeway route is meaningfully faster.” Its vehicles will obey the speed limit except in rare situations. Given how common speeding is on freeways, this could mean that Waymo vehicles wind up moving significantly slower than the flow of traffic.

In this piece I want to answer two questions: Why did it take so long for Waymo to begin offering driverless freeway service? And what’s next for the company?

Why freeway service is so difficult

A Waymo vehicle on a freeway in Los Angeles. (Photo courtesy of Waymo)

In many ways, freeways are less complex than surface streets. A vehicle just needs to stay in its lane and maintain a safe distance from the vehicle ahead. It is unlikely to encounter pedestrians, cyclists, or complicated four-way intersections.

This is why plenty of self-driving projects — including Tesla’s — experimented first with freeways before moving on to surface streets.

By 2013 Waymo, then called the Google self-driving project, had already developed technology that could navigate on freeways under the supervision of a Google employee.

But while it’s fairly easy to build a freeway driving system that operates under human supervision, it’s very difficult to dispense with the driver.

AI ads are going mainstream

2025-11-13 01:58:00

On October 6, the Internet turned on Taylor Swift. As part of the marketing campaign for her new album, Swift’s team announced a series of 12 short videos hidden across the world on QR codes. These videos, hosted on Taylor Swift’s YouTube account, appeared to be AI-generated. There were certain visual glitches — like messed up text or inconsistent lighting — which clued fans in.

Many fans were mad. Some pointed out that Swift had previously called out AI deepfakes in her 2024 endorsement of Kamala Harris. Self-described Swiftie Alyssa Yung told Rolling Stone, “The most disappointing aspect of this is how utterly hypocritical the use of AI is on Taylor’s project.”

“She’s a BILLIONAIRE. She can afford to pay artists and shoot short videos like those,” one person tweeted. The hashtag #SwiftiesAgainstAI started trending, and several outlets covered the controversy.

However, no one was able to definitively prove that the videos used AI. Swift’s representatives did not reply to media requests, and the story soon died down.

Using a new Google tool called SynthID, I was able to confirm that at least one of Swift’s videos, the “Berlin” video, was made with Google’s video generation tools. The tool works by detecting a watermark that Google hides in AI content generated by its tools. So a positive result is strong evidence that the video was made using a Google AI product. (Swift’s publicist did not respond to a request for comment).

If Taylor Swift — arguably the most famous musician in the world — is using AI to promote an album, the technology has clearly entered the mainstream.

And Swift isn’t alone. AI-generated ads have become more and more prevalent in recent months. Companies large and small have released AI-generated ads — some without even disclosing it.

Others plan to release AI ads soon. A survey by the Interactive Advertising Bureau found that around 30% of digital video ads this year will be made or enhanced using generative AI. The survey predicted that the number will rise to 39% next year.

Subscribe now

Some people think it’ll happen even faster.

“I think, realistically, we could be three to four years away to where every ad you see on television is created by AI,” said Kavan Cardoza, a filmmaker who co-founded an AI-based studio, PhantomX.

AI allows companies to generate ads at a fraction of the cost of traditional methods. And while early AI-generated ads sparked some controversy, that backlash doesn’t seem to have been strong enough to stop the trend toward ever more AI-generated ads.

A brief history of AI-generated advertising

Admakers have been experimenting with AI tools for close to a decade, but until recently those experiments were pretty limited:

In 2016, IBM trained a custom machine learning model to select clips for a movie trailer that a human subsequently edited together.
Also in 2016, a Japanese ad agency developed an “AI creative director” to help design ads. Its first project: an ad for Clorets mint tabs.
In 2019, Lexus released an ad whose script had been written by an AI model trained on previous car ads.

In these cases, the use of AI was a gimmick that helped drive interest in the ad. Also, the AI was doing conceptual work — choosing clips, developing concepts, or writing scripts — but it wasn’t generating the actual video content. It’s only in the last couple of years that AI has become sophisticated enough to do that.

One key step came in April 2023, when the Republican National Committee released an attack ad against Joe Biden. Generative AI tools were used to generate still images that appeared in the ad. These images look pretty fake: Biden’s face resembles a wax figure in the frame I screenshotted below. Still, the ad racked up over 350,000 views on YouTube.

The RNC didn’t use AI to generate any video clips because the technology wasn’t ready, as illustrated by this horrifying clip of Will Smith eating spaghetti:

Image and video generation have improved dramatically since then.

In June 2025, the first major viral AI ad aired during the NBA Finals. The ad for the prediction market Kalshi depicted a series of zany scenarios that users could bet on. The ad only cost around $2,000 to make.

That ad was made using Veo 3, a Google-made model released in May 2025. Veo 3 offered substantially improved generation quality and controllability compared to earlier models.

Most consumers don’t seem to mind AI ads

The first few AI-generated ads were controversial. For example, when Coca-Cola released an AI-generated ad last year, it touched a nerve with some artists. Alex Hirsch, the creator of the animated series Gravity Falls, tweeted that “Coca-Cola is ‘red’ because it’s made from the blood of out-of-work artists!”

Undeterred, Coca-Cola released another ad last week. While it also generated some criticism, the reaction seemed quieter this year.

Google’s use of AI-generated ads has attracted even less negative attention. In late October, Google released an ad showing an animated turkey leaving a farm to escape Thanksgiving.

Google didn’t disclose that the ad was AI-generated. Robert Wong, the co-founder of Google’s in-house marketing team, told the Wall Street Journal that consumers don’t care whether an ad was made with AI or not.

Wong might be right. The Wall Street Journal claimed that the turkey ad was Google’s first completely AI-generated ad. In fact, according to SynthID, Google used AI to generate most of the visuals for an earlier ad about using Google as a study tool (If you look closely at the first frame below, you can see that the writing on the notebook looks like gobbledygook). But outside of a few commenters on YouTube, no one seemed to notice.

SynthID result for the first frame of Google’s “Feeling stuck? Just ask Google.” ad. The lack of detection in the middle of the shot may be due to later editing or that the pattern is too simple to embed the watermark. Almost every frame in the ad contains the SynthID watermark.

All this points to a world where AI-generated ads become an increasingly important part of companies’ marketing strategies, even without being disclosed. Given how much faster and cheaper AI can be — Coca-Cola’s CMO told the Wall Street Journal it only took them a month to make their most recent ad compared to a year for a traditional ad — it may be that AI-generated ads will almost completely replace human-filmed ads.

But it might take a while. One reason is that some companies will try to stand out by touting their use of traditional techniques.

For example, BMW recently released a series of ads contrasting its cars with AI-generated slop. One starts with a video of a pigeon skateboarding before revealing that that clip is fake. The voiceover continues, “In a world where it’s hard to tell what’s genuine, it’s nice to know you can trust a BMW certified vehicle.”

Other brands may take a similar approach, trying to position themselves as authentic by not using AI.

Subscribe now

The nuts and bolts of AI admaking

Another reason the AI transition won’t happen overnight: AI tools still have significant limitations.

At the low end of the market, some people are probably using fully AI-generated ads—including ads that promote scams. But fully AI-generated ads don’t yet meet the needs of mainstream advertising clients.

AI can generate short video clips that can be hard to distinguish from real footage. But it takes significant human effort to turn a series of these clips into a polished ad.

I talked with PJ “Ace” Accetturo in early October. He runs Genre AI, the AI ad agency that created the Kalshi ad. He told me that his studio uses “the standard production process, just truncated and there’s no live-action shoot.”

After an ad concept has been approved by the client, a writer still writes a script and a director still directs it. However, instead of a human artist making sketches for the storyboard, an “AI cinematographer” uses AI tools to create a sequence of still images. Once the storyboard has been approved, an animator uses AI tools to turn each image into a video clip. A human editor then edits those clips together, does sound design, adds music, and so forth. There’s no need for the writer, director, or editor to be “AI native.”

This process isn’t universal, of course. Another filmmaker who works on AI ads, Kavan Cardoza, said that he sometimes works on “hybrid” ads. In one recent ad, another studio filmed real actors, and then Cardoza used AI for visual effects work.

Human involvement is shifting rapidly. Accetturo told me that Sora 2 is “extremely disruptive” to his business model because it can do an okay imitation of a complete clip with barely any human involvement. For a certain part of the ad market, the whole ad process might become something as simple as a single person typing in a prompt, generating a bunch of outputs, and choosing the best to publish.

In the meantime, AI studios like Accetturo’s are disrupting traditional ad agencies.

The limitations

As we’ve covered before, AI models still make mistakes and model reality incorrectly. Fans identified the Taylor Swift ads as AI-generated because of small glitches in text and the physics of the layout.

Consistency is a particular challenge. Dino Burbido created this graphic of all the different ways that the most recent Coca-Cola ad depicts the company’s trucks. Of the ten clips including trucks, there are eight unique wheel configurations!

To combat this, good ads can take hundreds or thousands of individual generations to produce compelling content — Accetturo called this dynamic “slot-machine pulls.” A high level of human involvement helps to deliver a more consistent vision. But even a large number of generations doesn’t ensure quality — the Coca-Cola ad apparently required more than 70,000 video clips.

Over time, better AI video generation models will probably take care of many of these issues. Already, some of the best AI-generated ads are almost impossible to distinguish from reality, such as this Nike spec ad by Karsten Winegeart.

But even as the quality level rises, making small revisions to generated content may continue to be a challenge. Gille Klabin, an independent filmmaker and director, told me that advertisers are exacting clients. They’ll ask for small tweaks, like rotating the product by ten degrees in a shot, which are impossible to do with current AI tools.

Klabin says “the level of specificity is not there, and the level of hallucinations is a lot.“ Even if you can get the model to rotate the product, maybe it gets the logo wrong.

Accetturo acknowledges that this is a challenge. He says he often tells clients, “Stop nitpicking this. This is an AI ad. It costs less.”

But Accetturo also says that this dynamic influences what types of ads his team will make. His studio specializes in comedy ads, because they require less specificity — and don’t need to feel as authentic. With comedy ads, Accetturo says that audiences think “You’re giving me my dopamine. I’ll watch the ad, this is hilarious.”

Even if a lack of steerability continues for a while, it may not matter if AI-generated content is cheap enough.

Subscribe now

Context rot: the emerging challenge that could hold back LLM progress

2025-11-11 00:33:51

This post was originally intended to be the final installment of the future of transformers series I published last December. However, I wasn’t satisfied with my first draft, and it took almost a year of additional thought and research to produce a version I was happy with. I’m excited to finally publish it and I hope you enjoy reading it.

The second half of the piece is for subscribers only, but I want as many people as possible to read it! So if you click here you can get a 15 percent discount on an annual subscription.

Many people believe that the next frontier for large language models is task length. A March study from the research organization METR documented that large language models have steadily gotten better at performing software engineering tasks that require significant time when performed by a human being. If anything, progress seems to be accelerating this year. Here’s an updated version of their chart:

If this trend continues, in a few years LLMs will be able to complete tasks that take human programmers multiple days. Maybe a few years after that it’ll be weeks, and then months. If the trend continues long enough, we could wind up with models that can take over large-scale software engineering projects, putting many human programmers out of work and further accelerating AI progress.

I don’t doubt that the trend toward longer task lengths still has some room to run. But I suspect that relatively soon, we’re going to bump up against fundamental limitations of the attention mechanism underlying today’s leading LLMs.

With attention, an LLM effectively “thinks about” every token in its context window before generating a new token. That works fine when there are only a few thousand tokens in the context window. But it gets more and more unwieldy as the number of tokens grows into the hundreds of thousands, millions, and beyond.

An analogy to the human brain helps to illustrate the problem. As I sit here writing this article, I’m not thinking about what I ate for breakfast in 2019, the acrimonious breakup I had in 2002, or the many episodes of Star Trek I watched in the 1990s. If my brain were constantly thinking about these and thousands of other random topics, I’d be too distracted to write a coherent essay.

But LLMs do get distracted as more tokens are added to their context window — a phenomenon that has been dubbed “context rot.” Anthropic researchers explained it in a September blog post:

Context must be treated as a finite resource with diminishing marginal returns. Like humans, who have limited working memory capacity, LLMs have an “attention budget” that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount, increasing the need to carefully curate the tokens available to the LLM.

This attention scarcity stems from architectural constraints of LLMs. LLMs are based on the transformer architecture, which enables every token to attend to every other token across the entire context. As its context length increases, a model’s ability to capture these pairwise relationships gets stretched thin, creating a natural tension between context size and attention focus.

The blog post went on to discuss context engineering, a suite of emerging techniques for helping LLMs stay focused by removing extraneous tokens from their context windows.

Those techniques are fine as far as they go. But I suspect they can only mitigate the underlying problem. If we want LLMs to reason effectively over much longer contexts, we may have to fundamentally rethink how LLMs work.

Structure matters

In college I paid my rent by working as a web programmer for the University of Minnesota. One of my first projects was to build a simple web application powered by a relational database. It worked fine in testing, but it became glacially slow with real user data. I didn’t understand why.

When I asked a more experienced programmer about it, his first question was “did you add an index to the database?”

“What’s an index?” I asked.

I soon learned that a database index works a lot like the index of a book.

Suppose you’re trying to find the first page in a history book that mentions Abraham Lincoln. If the book has no index, you’ll have to scan every page. This might take several minutes if it’s a long book. But if there is an index, its alphabetical structure will allow you to find the right page in a few seconds.

A database index has the same basic function: organize information so it’s easy to find. As I learned the hard way, an index becomes more and more necessary as data is added to a database.

This kind of scaling analysis is fundamental to any computer science curriculum. As a computer science major, I learned how to determine whether a computer program will scale gracefully or — like my database with no index — choke when applied to large data sets.

So when I started to study how large language models work, I was shocked to learn that one of the foundational concepts, the attention mechanism, has terrible scaling properties. Before an LLM generates a new token, it compares the most recent token to every previous token in its context window. This means that an LLM consumes more and more computing power — per token — as its context window grows.

If there are 101 previous tokens, it takes 100 attention operations to generate the next token. If there are 1,001 previous tokens, it takes 1,000 attention operations. And these costs are per token, so a session with 10 times more tokens takes about 100 times more computing power.1

Good programmers try to avoid using algorithms like this. Unfortunately, nobody has found a viable alternative to attention.

So AI companies have tried to overcome the problem with engineering muscle instead. They’ve developed clever algorithms like FlashAttention that minimize the computational cost of each attention operation. And they’ve built massive data centers optimized for attention calculations. For a while, these efforts had impressive results: context windows grew from 4,096 tokens in 2022 to a million tokens in early 2024.

Industry leaders hope to continue this trend with even more engineering muscle. In a July interview with Alex Kantrowitz, Anthropic CEO Dario Amodei said that “there’s no reason we can’t make the context length 100 million words today, which is roughly what a human hears in their lifetime.”

I don’t doubt that Anthropic could build an LLM with a context window of 100 million tokens if it really wanted to — though using it might be stupendously expensive. But I don’t think anyone will be happy stopping at 100 million tokens.

For one thing, that 100 million figure seems like an underestimate for the number of tokens humans “process” over a lifetime. Studies show the average adult speaks around 15,000 words per day — which works out to around 400 million words over a lifetime. Presumably, most people hear a similar number of words, and read a lot of words as well. They also experience a lot of images, sounds, smells, and other sensations. If we represent all of those experiences as tokens, I bet the total would comfortably exceed 1 billion.

Moreover, AI companies aren’t just trying to match human performance, they’re trying to dramatically exceed it. That could easily require models to process a lot more.

More context, more problems

But there’s also a deeper problem. Today’s leading LLMs don’t effectively use the million-token context windows they already have. Their performance predictably degrades as more information is included in the context window.

In November 2023, OpenAI released GPT-4 Turbo, the first model with 128,000 tokens of context. Later that same month, Anthropic released Claude 2.1, the first model with 200,000 tokens of context.

Greg Kamradt was one of the first people to perform a needle-in-a-haystack test on these models. He took a long document and randomly inserted a “needle” sentence like “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.”

Then he’d ask an LLM “what is the best thing to do in San Francisco?” and see if it could answer. He found that both GPT-4 Turbo and Claude 2.1 performed worse on this task as the context length increased — especially if the “needle” was in the middle of the document:

Frontier labs worked hard to improve performance on this kind of task. By the time Anthropic released Claude 3 in March 2024, needle-in-a-haystack performance was a lot better. But this is the simplest possible test of long-context performance. What about harder problems?

In February 2025, a team of researchers at Adobe published research on a more difficult variant of the needle-in-a-haystack test. Here the “needle” was a sentence like “Yuki lives next to the Semper Opera House,” and the model would be asked “Which character has been to Dresden?”

To answer this question, you need to know that the Semper Opera House is in Dresden. Leading language models do know this, so if you give them this challenge in a short prompt (a small “haystack”) they tend to get it right more than 90% of the time. But if you give them the same challenge in a larger “haystack” — for example, a 32,000-token prompt — accuracy drops dramatically:

GPT-4o goes from 99% to 70%
Claude 3.5 Sonnet goes from 88% to 30%
Gemini 2.5 Flash goes from 94% to 48%
Llama 4 Scout goes from 82% to 22%

Long-context performance dropped even further when the researchers asked “which character has been to the state of Saxony.” This question required the model to recognize that the Semper Opera House is in Dresden and that Dresden is in Saxony. The longer the context got, the worse models tended to do on questions like this that required two reasoning “hops”:

So not only do LLMs perform worse as more tokens are added to their context, they exhibit more severe performance degradation on more complex tasks.2 I think this bodes poorly for getting LLMs to do the kind of work that takes human workers days, weeks, or even months. These tasks will not only require a lot of tokens, they’re also far more complex than contrived needle-in-a-haystack benchmarks.

The curse of context rot

Photo by Serhii Luzhevskyi via iStock Editorial / Getty Images Plus.

And indeed, technologists have noticed that LLM performance on real-world tasks tends to decline as contexts get longer.

In June, a Hacker News commenter coined the phrase “context rot” to describe the phenomenon where LLMs become less effective as the size of their context grows. The startup Chroma published a widely read study on the phenomenon in July.

Click here for a 15 percent discount on an annual subscription!

No one fully understands how LLMs work, so it’s hard to say exactly why context rot happens. But here’s how I think about it.

Tech leaders insist there is no AI bubble

2025-11-01 02:35:20

Five of the largest hyperscalers—Alphabet, Microsoft, Amazon, Meta, and Oracle—spent a combined $106 billion on capital investments in their most recent quarters. This was the first time those companies have spent more than $100 billion in a single quarter. It was 9% more than last quarter and a whopping 73% increase from a year earlier.

Four of these companies — all but Oracle — published quarterly financial reports this week. Here’s an update to a chart we first published on Monday:

In earnings calls this week, company leaders stressed that they were not making speculative long-term bets. They said they needed more data centers just to keep up with customer demand.

AI skeptics and AI boosters are both wrong

2025-10-31 03:52:15

Earlier this month I attended an AI conference called The Curve in Berkeley. A lot of people there were “AGI-pilled.”

For example, I participated in a role-playing exercise organized by Daniel Kokotajlo, a co-author of the AI 2027 report. That report argues that AI systems will soon achieve human-level intelligence. Then they’ll rapidly improve themselves, leading to superhuman AI capabilities and an extreme acceleration of scientific discovery and economic growth.

I also attended a talk by another AGI-pilled writer, Nate Soares, who believes that superintelligent AI will kill everyone.

At the opposite end of the spectrum are skeptics who believe AI is not just overhyped but practically useless. This perspective wasn’t as well represented at the conference, but I appeared on a panel with perennial AI skeptic Gary Marcus. He laid out his case in a New York Times op-ed a couple of weeks ago:

These systems have always been prone to hallucinations and errors. Those obstacles may be one reason generative AI hasn’t led to the skyrocketing profits and productivity that many in the tech industry predicted. A recent study run by MIT’s NANDA initiative found that 95% of companies that did AI pilot studies found little or no return on their investment. A recent financial analysis projects an estimated shortfall of $800 billion in revenue for AI companies by the end of 2030.

Marcus argued that companies should “stop focusing so heavily on these one-size-fits-all tools and instead concentrate on narrow, specialized AI tools engineered for particular problems”—tools like AlphaFold, the Google DeepMind model for predicting protein structures.

Another skeptic is Ed Zitron, who has built a following making the case that OpenAI will never turn a profit because LLMs can’t generate value commensurate with their high inference costs. Zitron expects OpenAI to collapse in the next few years, and suggests that this could set off a tech industry contagion analogous to the failure of Lehman Brothers in 2008.

My view is between these extremes. I think today’s AI has genuinely impressive capabilities that are likely to improve further in the coming months and years. I think the AI industry is likely to be profitable in the long run, and that OpenAI’s basic business model is perfectly reasonable.

But I don’t think we’re very close to human-level intelligence. And I don’t think AI is about to drive the kind of massive social and economic changes that AGI-pilled folks expect.

So I recently found myself nodding along as Andrej Karpathy was interviewed on Dwarkesh Patel’s podcast. Karpathy co-founded OpenAI in 2015 before joining Tesla to lead its self-driving team. Since departing Tesla in 2022, the 39-year-old has become something of an elder statesman in the AI industry. People credit him with coining the phrase vibe coding back in February.

A key theme throughout the interview was a palpable frustration with both extremes in the AI debate.

“When I go on my Twitter timeline, I see all this stuff that makes no sense to me,” Karpathy said. He believes that fundraising needs have pushed AI leaders to make unrealistic promises about the pace of progress. At the same time, he said he was “overall very bullish on technology.”

If you have time, I encourage you to listen to the full two-hour conversation; it was packed with deep insights about the state of AI and its likely impact on the broader economy. But for those who don’t have two hours, I’ll highlight the bits I found most interesting.

Then I’ll discuss that MIT study finding that 95% of enterprise AI projects fail. Skeptics like Marcus love to cite it as evidence that AI is useless. But the study’s actual findings were more interesting than that — and they don’t really support the views of either AI skeptics or the AGI-pilled.

16 charts that explain the AI boom

2025-10-28 02:37:58

AI bubble talk has intensified in recent weeks. Record investments combined with economic weakness have left some leery of a dip in investor confidence and potential economic pain.

For now, though, we’re still in the AI boom. Nvidia keeps hitting record highs. OpenAI released another hit product, Sora, which quickly rose to the top of the app store. Anthropic and Google announced a deal last Thursday to give Anthropic access to up to one million of Google’s chips.

In this piece, we’ll visualize the AI boom in a series of charts. It’s hard to put all of AI progress into one graph. So here are 16.

1. The largest technology companies are investing heavily in AI

If I had to put the current state of AI financials into one chart, it might be this one.

Training and running current AI models require huge, expensive collections of GPUs, stored in data centers. Someone has to invest the money to buy the chips and build the data centers. One major contender: big tech firms, who account for 44% of the total data center market. This chart shows how much money five big tech firms have spent on an annualized basis on capital expenditures, or capex.

Not all tech capex is spent on data centers, and not all data centers are dedicated to AI. The spending shown in this chart includes all the equipment and infrastructure a company buys. For instance, Amazon also needs to pay for new warehouses to ship packages. Google’s capex also covers servers to support Google search.

But a large and increasing percentage of this spending is AI related. For instance, Amazon’s CEO said in their Q4 2024 earnings call that AI investment was “the vast majority” of Amazon’s recent capex. And it will continue to grow. In Meta’s Q2 2025 earnings call, CFO Susan Li noted that “scaling GenAI capacity” will be the biggest driver of increased 2026 capex.

2. AI spending is significant in historical terms

Amazon, Meta, Microsoft, Alphabet, and Oracle spent $241 billion in capex in 2024 — that was 0.82% of US GDP for that year. In the second quarter this year, the tech giants spent $97 billion — 1.28% of the period’s US GDP.

If this pace of spending continues for the rest of 2025, it will exceed peak annual spending during some of the most famous investment booms in the modern era, including the Manhattan Project, NASA’s spending on the Apollo Project, and the internet broadband buildout that accompanied the dot-com boom.

This isn’t the largest investment in American history — Paul Kedrosky estimated that railroad investments peaked at about 6% of the US economy in the 1880s. But it’s still one of the largest investment booms since World War II. And tech industry executives have signaled they plan to spend even more in 2026.

One caveat about this graph: not all the tech company capex is directed at the US. Probably only around 70 to 75% of the big tech capex is going to the US.1 However, not all data center spending is captured by big tech capex, and other estimates of US AI spending also lie around 1.2 to 1.3% of US GDP.

3. Companies are importing a lot of AI chips

AI chips have a famously long and complicated supply chain. Highly specialized equipment is manufactured all across the world, with most chips assembled by TSMC in Taiwan. Basically all of the AI chips that tech companies buy for US use have to be imported first.

The Census data displayed above show this clearly. There’s no specific trade category corresponding to Nvidia GPUs (or Google’s TPUs), but the red line corresponds to large computers (“automatic data processing machines” minus laptops), a category that includes most GPU and TPU imports. This has spiked to over $200 billion in annualized spending recently. Similarly, imports of computer parts and accessories (HS 8473.30), such as hard drives or power supply units, have also doubled in the past year.

These imports have been exempted from Trump’s tariff scheme. Without that exemption, companies would have had to pay somewhere between $10 and $20 billion in tariffs on these imports, according to Joey Politano.

4. They’re building a lot of data centers too

The chart above shows the construction costs of all data centers built in the US, according to Census data. This doesn’t include the value of the GPUs themselves, nor of the underlying land. (The Stargate data center complex in Abilene, Texas is large enough to be seen from space). Even so, investment is skyrocketing.

In regions where data centers are built, the biggest economic benefits come during construction. A report to the Virginia legislature estimated that a 250,000 square foot data center (around 30 megawatts of capacity) would employ up to 1,500 construction workers, but only 50 full-time workers after the work was completed.

A few select counties do still earn significant tax revenues; Loudoun County in Virginia earns 38% of their tax revenue from data centers. However, Loudoun County also has the highest data center concentration in the United States, so most areas receive less benefit.

Subscribe now

5. Data centers, particularly large ones, are geographically concentrated

This is a map from the National Renewable Energy Laboratory (NREL) which shows the location of data centers currently operating or under construction. Each circle represents an individual data center; larger circles (with a deeper shade of red) are bigger facilities.

There’s a clear clustering pattern, where localities with favorable conditions — like cheap energy, good network connectivity, or a permissive regulatory environment — attract most of the facilities in operation or under construction. This is particularly true of the big data centers being constructed for AI training and inference. Unlike data centers serving internet traffic, AI workloads don’t require ultra-low latency, so they don’t need to be located close to users.

6. Few data centers are being built in California

The chart above shows ten of the largest regions in the US for data center development, according to CBRE, a commercial real estate company. Northern Virginia has the largest data center concentration in the world.

Access to cheap energy is clearly attractive to data center developers. Of the ten markets pictured above, six feature energy prices below the US industrial average of 9.2¢ per kilowatt-hour (kWh), including the five biggest. Despite California’s proximity to tech companies, high electricity rates seem to have stunted data center growth there.

Electricity prices are one major advantage that the US has over the European Union in data center construction. In the second half of 2024, commercial electricity prices in Europe averaged €0.19 per kW-hour, around double the comparable US rate.

7. Low vacancy and high demand are pushing up data center rents

Companies often rent space in data centers, and the rents companies pay are often on a per kilowatt basis. During the 2010s, these costs were steadily falling. But that has changed over the last five years, as the industry has gotten caught between strong AI-driven demand and increasing physical constraints. Even after adjusting for inflation, the cost of data center space has risen to its highest level in a decade.

According to CBRE, this is most true with the largest data centers, which “recorded the sharpest increases in lease rates, driven by hyperscale demand, limited power availability and elevated build costs.” In turn, hyperscalers like Microsoft consistently claim that demand for their cloud services outstrips their capacity.

This leads to a situation where major construction is paired with record low vacancy rates, around 1.6% in what CBRE classifies as “primary” markets. So even if tech giants are willing to spend heavily to expand their data centers, physical constraints may prevent them from doing so as quickly as they’d like.

8. Data center power consumption might double by 2030 — or it might not

The International Energy Agency estimates that data centers globally consumed around 415 terawatt-hours (TWh) of electricity in 2024. This figure is expected to more than double to 945 TWh by 2030.

That’s 530 TWh of new demand in six years. Is that a lot? In the same report, the IEA compared it to expected growth in other sources of electricity demand. For example, electric vehicles could add more than 800 TWh of demand by 2030, while air conditioners could add 650 and electric heating could add 450 TWh.

There’s significant uncertainty to projections of data center demand. McKinsey has estimated that data center demand could grow to 1,400 TWh by 2030. Deloitte believes data center demand will be between 700 and 970 TWh. Goldman Sachs has a wide range between 740 and 1,400 TWh.

Unlike other categories, data centers’ electricity demand will be concentrated, straining local grids. But the big picture is the same for any of these estimates: data center electricity growth is going to be significant, but it will only be a modest slice of overall electricity growth as the world tries to decarbonize.

9. Water use is an overrated problem with AI

There’s been a lot of media coverage about data centers guzzling water, harming local communities in the process. But total water usage in data centers is small compared to other uses, as shown in this chart using data compiled by Andy Masley.

In 2023, data centers used around 48 million gallons per day, according to a report from Lawrence Berkeley National Laboratory. That sounds like a lot until you compare it to other uses. AI-related data centers use so little water, relative to golf courses or mines, that you can’t even see the bar.

Although some data centers use water to cool computer chips, this actually isn’t the primary way data centers drive water consumption. More water is used by power plants that generate electricity for data centers. But even if you include these off-site uses, daily water use is about 250 million gallons per day.

That’s “not out of proportion” with other industrial processes, according to Bill Shobe, an emeritus professor of environmental economics at the University of Virginia. Shobe told me that “the concerns about water and data centers seem like they get more time than maybe they deserve compared to some other concerns.”

There are still challenges around data centers releasing heated water back into the environment. But by and large, data centers don’t consume much water. In fact, if there is a data center water-use problem, it’s that some municipalities charge too little for water in dry areas like Texas where water is scarce. If these municipalities priced their water appropriately, that would encourage companies to use water more efficiently — or perhaps build data centers in other places where water is abundant.

Subscribe now

10. There’s a lot of demand for AI inference

It’s not often that you get to deal with a quadrillion of something. But in October, Google CEO Sundar Pichai announced that the company was now processing 1.3 quadrillion tokens per month between their product integrations and API offerings. That’s equivalent to processing 160,000 tokens for every person on Earth. That’s more than the length of one Lord of the Rings book for every single person in the world, every month.

It’s difficult to compare Google’s number with other AI providers. Google’s token count includes AI features inside Google’s own products — such as the AI summary that often appears at the top of search results. But OpenAI has also announced numbers in the same ballpark. On October 6, OpenAI announced that it was processing around six billion tokens per minute, or around 260 trillion tokens per month on its developer API. This was about a four-fold increase from the 60 trillion monthly in January 2025 that The Information reported.

11. Consumer AI products are getting more popular — especially ChatGPT

Consumer AI usage has steadily increased over the past three years. While ChatGPT famously reached a million users within five days of its release, it took another 11 months for the service to reach 100 million weekly active users. Since then, reported users have grown to 800 million, though the true number may be slightly lower. An academic paper co-written by OpenAI researchers noted that their estimates double-counted users with multiple accounts; the numbers given by executives may similarly be overestimates.

Other AI services have grown more slowly: Google’s Gemini has 450 million monthly active users per CEO Sundar Pichai, while Anthropic’s Claude currently has around 30 million monthly active users, according to Business Insider.

Subscribe now

12. Tech giants have enough profits to pay for their AI investments

With such high levels of AI investment, one might worry about the financial stability of the firms behind the data center rollout. But for most of the big tech firms, this isn’t a huge issue. Cash flow from operations continues to exceed their infrastructure spending.

There is some variation among the set, though. Google earns so much money from search that they started issuing dividends in 2024, even amidst the capex boom. Microsoft and Meta have also reported solid financial performance. On the other hand, both Amazon and Oracle have had a few recent quarters with negative free cash flow (Amazon’s overall financial health is excellent; Oracle has been accumulating debt, partly as a result of aggressive stock buybacks).

There are some reasons to take companies’ reported numbers with a grain of salt. Meta recently took a 20% stake in a $27 billion joint venture that will build a data center in Louisiana, which Meta will operate. This allows Meta to acquire additional data center capacity without paying the full costs upfront. Notably, Meta agreed to compensate its partners if the data center loses significant value over the first 16 years, which means the deal could be expensive for Meta in the event of an AI downturn.

13. OpenAI expects to lose billions over the next five years

Tech giants like Google, Meta, and Microsoft can finance AI investments using profits from their non-AI products. OpenAI, Anthropic, and xAI do not have this luxury. They need to raise money from outside sources to cover the costs of building data centers and training new models.

This chart, based on reporting from The Information, shows recent OpenAI internal projections of its own cash flow needs. At the start of 2025, OpenAI expected to reach a peak negative cash flow ($20 billion) in 2027. OpenAI expected smaller losses in 2028 and positive cash flow in 2029.

But in recent months, OpenAI’s projections have gotten more aggressive. Now the company expects to reach peak negative cash flow (more than $40 billion) in 2028. And OpenAI doesn’t expect to reach positive cash flow until 2030.

So far, OpenAI hasn’t had trouble raising money; many people are eager to invest in the AI boom. But if public sentiment shifts, fundraising opportunities could dry up quickly.

14. OpenAI deals boost partners’ stock

Over the past two months, OpenAI has made four deals that could lead to the construction of 30 gigawatts of additional data center capacity. According to CNBC, one gigawatt of data center capacity costs around $50 billion at today’s prices. So the overall cost of this new infrastructure could be as high as $1.5 trillion — far more than the $500 billion valuation given to OpenAI in its last fundraising round.

Each of these deals was made with technology companies that were acting as OpenAI suppliers: three deals with chipmakers and one with Oracle, which builds and operates data centers.

OpenAI got favorable terms in each of these deals. Why? One reason is that partnering with OpenAI boosted the partners’ stock price. In total, the four companies gained $636 billion in stock value on the days their respective deals were announced. (Some stocks have since decreased slightly in value).

It’s unclear whether these deals will fully come to fruition as planned. 30 gigawatts is a huge amount of capacity. It’s almost two thirds of the total American data center capacity in operation today (according to Baxtel, a data center consultancy).

It also dwarfs OpenAI’s current data center capacity. In a recent internal Slack note, reported by Alex Heath of Sources, Sam Altman wrote that OpenAI started the year with “around” 230 megawatts of capacity, and that the company is “now on track to exit 2025 north of 2 gigawatts of operational capacity.”

15. OpenAI’s annualized revenue has risen to $13 billion

The key question for investors is how quickly AI startups can grow their revenue. Thus far, both OpenAI and Anthropic have shown impressive revenue growth, at least according to figures reported in the media. OpenAI expects $13 billion in revenue in 2025, while Anthropic recently told Reuters that its “annual revenue run rate is approaching $7 billion.”

Both companies are still losing billions of dollars a year, however, so continued growth is necessary.

OpenAI and Anthropic have different primary revenue streams. 70% of OpenAI’s revenue comes from consumer ChatGPT subscriptions. Meanwhile, Anthropic earns 80% of its revenue from enterprise customers, according to Reuters. Most of that revenue appears to come from selling access to Claude models via an API.

Subscribe now

16. OpenAI predicts huge revenue growth

This chart from The Information shows OpenAI’s internal revenue projections.

After generating $13 billion this year, OpenAI hopes to generate $30 billion next year, $60 billion in 2027, and a whopping $200 billion in 2030. As you can see, OpenAI’s revenue projections have gotten more optimistic over the course of 2025. At the start of the year, the company was projecting “only” $174 billion in revenue in 2030.

OpenAI hopes to diversify its revenue streams over the next few years. The company expects ChatGPT subscriptions will continue to be the biggest moneymaker. But OpenAI is looking for healthy growth in its API business. And the company hopes that agents like Codex will generate tens of billions of dollars per year by the end of the decade.

The AI giant is also looking to generate around $50 billion in revenue from new products, including showing ads to free users of OpenAI products. OpenAI needs strategies to make money off the 95% of ChatGPT users who do not currently pay for a subscription. This is probably a large part of the logic behind OpenAI’s recent release of in-chat purchases.

Anthropic has similarly forecast that its annualized revenue could reach $26 billion by the end of 2026, up from $6 to $7 billion today.

These predictions are aggressive: a recent analysis by Greg Burnham of Epoch AI was unable to find any American companies that have gone from $10 billion in annual revenue to $100 billion in less than seven years. OpenAI predicts that it will take fewer than four.

On the other hand, Burnham found that OpenAI was potentially the second fastest company ever to go from $1 billion to $10 billion, after pandemic-era Moderna. If OpenAI can sustain its current pace of growth (roughly 3x per year), it will be able to hit its revenue targets.

Whether OpenAI and Anthropic can do so is already a trillion dollar question.

Thanks to Joey Politano, Nat Purser, and Cathy Kunkel for helpful comments on this article.

This number is based on two proxies. First, Epoch AI estimates that 74% of current GPU-intensive data capacity is located in the United States. Second, big tech companies have reported that a large majority of their long-lived assets (which includes data centers) are located in the US. Specifically, at the end of its most recent fiscal year Microsoft had 60.2% of its long-lived assets in the US. The figure was 73.5% for Amazon, 75.3% for Google, 75.6% for Oracle, and 86.2% for Meta.

Understanding AIModify

Rss preview of Blog of Understanding AI

Why freeway service is so difficult

A brief history of AI-generated advertising

Most consumers don’t seem to mind AI ads

The nuts and bolts of AI admaking

The limitations

Structure matters

More context, more problems

The curse of context rot

1. The largest technology companies are investing heavily in AI

2. AI spending is significant in historical terms

3. Companies are importing a lot of AI chips

4. They’re building a lot of data centers too

5. Data centers, particularly large ones, are geographically concentrated

6. Few data centers are being built in California

7. Low vacancy and high demand are pushing up data center rents

8. Data center power consumption might double by 2030 — or it might not

9. Water use is an overrated problem with AI

10. There’s a lot of demand for AI inference

11. Consumer AI products are getting more popular — especially ChatGPT

12. Tech giants have enough profits to pay for their AI investments

13. OpenAI expects to lose billions over the next five years

14. OpenAI deals boost partners’ stock

15. OpenAI’s annualized revenue has risen to $13 billion

16. OpenAI predicts huge revenue growth

Understanding AI Modify