MoreRSS

site iconTomasz TunguzModify

I’m a venture capitalist since 2008. I was a PM on the Ads team at Google and worked at Appian before.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Tomasz Tunguz

My Favorite Books of 2025

2025-12-29 08:00:00

This year I traveled through systems, human & machine, from the mathematics of complexity to industrial espionage.

  1. The Complex World: An Introduction to the Foundations of Complexity Science: Donella Meadows’ Thinking in Systems introduced me to feedback loops a decade ago. This book goes deeper, surveying where complexity science stands today.
  2. Math Without Numbers by Milo Beckman: A vivid & accessible tour of abstract mathematics. Beckman covers topology, infinities larger than infinity, & other mind-bending concepts, all without a single digit & with proofs to explain it all.
  3. On Democracy by E. B. White: White, New Yorker editor, Charlotte’s Web author, Elements of Style co-creator, is among my favorite writers. These essays provide a time capsule to help us understand where we are today. Written as fascism spread across Europe & America debated isolationism, White’s defense of America provided a window into another era of rapid political change.
  4. Breakneck: China’s Quest to Engineer the Future by Dan Wang: Wang argues that the US & China each have a dominant form of government problem-solving : through laws in the US, or through engineering in China.
  5. God Save Texas by Lawrence Wright: Wright’s Pulitzer Prize-winning exploration of the Lone Star State. Both California & Texas value independence & innovation. As a Californian, it was fascinating to see Texas through Wright’s eyes.
  6. The NVIDIA Way: Jensen Huang believes Nvidia’s worst enemy isn’t competition but complacency. Kim’s portrait reveals a CEO who spends late nights alongside his team, torturing them into greatness.
  7. The Unaccountability Machine by Dan Davies: Davies argues that modern organizations function like runaway AIs, making decisions no human intends. A hotel executive cuts staff to improve the balance sheet. Later, you can’t check into your room & the clerk can only offer a voucher. There’s no one to call, no way to communicate back. That’s an unaccountability machine.
  8. Karla’s Choice by Nick Harkaway: I’ve read every le Carré. His son Harkaway picks up where his father left off, adding a bracing entry to the canon.
  9. Titanium Noir by Nick Harkaway: Before spy novels, Harkaway spent fifteen years writing science fiction. Curious about his earlier work, I was not disappointed. Titanium Noir explores a world in which the wealthy have access to drugs that double their lifespan & double their size. The novel examines what happens when health becomes a function of wealth.
  10. Boom: Bubbles & the End of Stagnation by Byrne Hobart & Tobias Huber: Why does transformative progress require financial bubbles? This builds on Carlota Perez’s work on technology innovation cycles. Hobart & Huber argue that bubbles’ poor accountability shelters the world’s most important breakthroughs.

What should I read in 2026?

Motive S-1 Analysis: How 7 Key Metrics Stack Up

2025-12-28 08:00:00

Motive, the AI-powered fleet management company formerly known as KeepTruckin, filed their S-1.

Founded in 2013 by Shoaib Makani, Ryan Johns, & Obaid Khan, the company has grown from an electronic logging device (ELD)1 compliance tool into a comprehensive physical operations platform serving nearly 100,000 customers across trucking, construction, oil & gas, & manufacturing.

Motive’s platform has since expanded beyond compliance to combine AI-powered dashcams for driver safety, GPS tracking for real-time visibility, & spend management cards to control costs. This suite acts as a central operating system for physical economy businesses, unifying data from vehicles, drivers, & equipment into a single interface.

Metric Motive (2025) Samsara (at IPO)
ARR $501M $492M
ARR Growth 27% 76%
Gross Margin 70% 70%
Core Customers (>$7.5k / >$5k) 9,201 13,000+
Large Customers (>$100k) 494 715
Core NDR 110% 115%
Large NDR 126% >125%
Net Income Margin -42% -34%
Employees 4,508 ~1,500
ARR / Employee $111k $328k
ACV (Large >$100k)2 $375k $303k
ACV (Total) $5k $17k
Equity Raised $600M $930M

Both companies achieved roughly $500m in ARR at the time of IPO.

motive_samsara_revenue_comparison

Samsara grew 76% annually at IPO compared to Motive’s 27%. This growth was fueled by higher sales efficiency, likely driven by deal size; Samsara’s average contract value (ACV) of $17k was more than three times Motive’s $5k.

motive_samsara_arr_growth

This disparity likely stems from different initial go-to-market strategies. Motive initially focused on the SMB segment (specifically owner-operators & small fleets needing a cost-effective compliance solution for the ELD mandate), building a massive base of smaller customers.

Motive’s ACV of $5k closely mirrors Fleetmatics’3 ACV of $6.8k at its IPO, confirming Motive joined Fleetmatics as a high-volume SMB player, but with modern AI capabilities. In contrast, Samsara targeted mid-market industrial operations from the start.

While legacy players like Geotab & Verizon Connect (Fleetmatics) maintain large installed bases, Samsara & Motive are capturing share with modern, AI-first platforms.

Company Revenue / ARR Connected Assets Status
Geotab $1B (Est) 5M+ Market Share Leader
Samsara $1.52B (LTM) 2M+ Revenue Leader
Verizon Connect $600M+ (Est) 2M+ Incumbent
Motive $501M (ARR) 500k+ Challenger

Motive’s customer metrics reveal an enterprise-focused growth strategy. Large customers (>$100k ARR) grew 58% year-over-year, from 312 to 494. This 58% growth in large accounts compares favorably to Samsara, which grew its $100k+ customer count 48% year-over-year at the time of its IPO.

Motive’s enterprise accounts grow at 126% net dollar retention, meaning the average large customer spends 26% more each year. The company is successfully landing & expanding within enterprise accounts.

Core customers grew at 17%, from 7,875 to 9,201.

motive_customer_growth

Both companies maintain similar 70% gross margins, impressive for businesses that ship hardware with their software. This positions them at the median for public SaaS companies, despite the hardware component.

The profitability picture differs. Motive’s net loss margin expanded from -35% in 2023 to -42% in the most recent period, while Samsara improved from -100% to -34% in the nine months before its IPO. Samsara’s improvement was driven by operating leverage : revenue grew 108% while sales & marketing expenses increased 11%.

Despite similar gross margins, Motive’s bottom line is weighed down by significantly higher “Other Expense” ($57M in the last nine months). This figure includes $22M in interest expense on approximately $300M of term debt, with the remainder driven by non-cash charges related to convertible securities.

Samsara generated $328k in ARR per employee at IPO. Motive generates $111k, roughly one-third. With 4,508 employees versus Samsara’s 1,500 at IPO, Motive has built a much larger organization to achieve similar scale, with about 3.2k employees in Pakistan.

motive_samsara_efficiency

Samsara trades at approximately 14x forward revenue, roughly in line with other vertical SaaS companies.

Motive has raised $600 million from Kleiner Perkins, GV, BlackRock, & others at a $2.85 billion valuation. With $500M in ARR, that implies a roughly 6x ARR multiple in the private markets.

Despite Motive having raised approximately $600 million in equity capital, Samsara’s path to IPO was significantly more capital-intensive, with over $930 million raised pre-IPO. However, Motive’s reliance on debt, holding roughly $300 million in term loans, partially offsets this difference in total capitalization, highlighting contrasting financing strategies between the two leaders.

Given these factors, what is Motive worth? Using an interaction model4, which is a refinement on the initial linear model, the analysis implies a valuation of approximately $3.7 billion.

motive_valuation_sensitivity

Congratulations to the Motive team on reaching this milestone. Building a $500M ARR business in physical operations is no small feat, especially in a competitive market.


  1. An ELD is a hardware sensor that connects to a vehicle’s engine to track driving hours, a requirement mandated by federal law for safety. ↩︎

  2. ACV for Large Customers is calculated by dividing the ARR segment share (37% for Motive, 44% for Samsara) by the reported large customer counts (494 & 715, respectively) as disclosed in the filings. All other figures are pulled directly from the S-1 & IPO prospectuses. ↩︎

  3. Verizon acquired Fleetmatics in 2016 for $2.4 billion in cash, representing a roughly 7.0x forward revenue multiple. ↩︎

  4. Updated on Dec 30, 2025. The valuation model uses an Interaction Model (Growth × Margin) which improved statistical fit (R-squared 0.35 → 0.41). The model assumes a 27% forward growth rate (consistent with reported ARR growth) and a -42% net income margin. For comparison, the most recent nine-month historical GAAP results show 21.7% revenue growth and a -42.3% net income margin. The final valuation is derived by applying the predicted multiple to Motive’s estimated NTM revenue (Current ARR × (1 + Growth/2)). ↩︎

Scoring 2025's Predictions

2025-12-23 08:00:00

Every year I make a list of predictions & score the previous year’s. You can find my 10 Predictions for 2026 here. 2025 was a good year : I scored 7.85 out of 10.

1. The IPO market rips.

Company Sector Market Cap, $b vs Last Private Round
CoreWeave AI Infrastructure 40.5 2.1x
Circle Stablecoin/Fintech 20.3 2.2x
Figma Design Software 18.85 0.9x
Chime Digital Banking 11.6 0.5x
Hinge Health Health Tech 3.8 0.6x

Score : 0.6.

46 software IPOs raised $12.3b in 2025, up from 21 IPOs raising $3.8b in 2024. The 2021 peak saw 126 tech IPOs raise over $150b. 1 CoreWeave & Circle successfully debuted with significant market caps & strong post-IPO performance. 2 However, others like Figma & Chime are trading below their last private valuations, reflecting a more discerning public market. We also didn’t see some of the high-flying IPOs like SpaceX, Stripe, & Databricks go out, although 2026 is a new year.

2. Google continues their surge in AI.

Score : 1.

Google has reclaimed its position at the apex of the AI landscape, ranking in the top tier of nearly every major category. Gemini 3 represents a fundamental leap in pre-training efficiency & multimodal integration, a thesis explored in The Scaling Wall Was A Mirage 3.

Gemini 3 Flash 4 has redefined the frontier for performance & latency, becoming the default engine for high-frequency agentic workflows.

In the open-source arena, the Gemma models 5 consistently hold the top spots for their weight classes, offering 70B-level reasoning in 27B packages. Even in creative media, Google’s video models 6 rank in the top three globally, prioritizing temporal consistency & character stability for enterprise use.

3. Voice becomes a dominant interface for people with AI as speech models are pushed on device & the accuracy/latency astounds.

Score : 1.

OpenAI reported that ChatGPT voice chat accounts for 19% of total user engagement as of October 2025. 7 Globally, there are now 8.4b voice assistants in use, with 153m users in the US alone. 8 80% of businesses plan to integrate AI-driven voice into operations by 2026. 9 The prevalence of dictation with Whisper, WisprFlow, & conversations with agents like Gemini Live is now normal.

4. US VC investment remains roughly around $210-$230b, but VC fundraising increases by 20%

Score : 0.5.

US VC investment hit the mark, landing at approximately $220b for 2025, driven by massive AI rounds. 10 However, the fundraising prediction missed. While deal counts rose 11% in early 2025 11, actual US VC fundraising is on track for a ~20% decline, totaling roughly $65b for the year. 12 A prolonged liquidity crunch & a slow exit environment kept LPs cautious, despite the enthusiasm for AI.

5. Consolidation is the theme for the Modern Data Stack.

Score : 1.

2025 was a record year for data infrastructure M&A, as the “Modern Data Stack” shifted from a collection of best-of-breed tools to a vertical race for integrated platforms.

Acquirer Target Value ($b) Strategic Layer
IBM Confluent 11.0 Real-time Data Streaming
Salesforce Informatica 8.0 Data Governance
dbt Labs Fivetran - Data Integration
CoreWeave Weights & Biases 1.7 MLOps Software
OpenAI Statsig 1.1 Product Analytics
Databricks Neon 1.0 Serverless Database

The consolidation moved down the stack, proving that the race is now for power, compute, & integrated software. Most notably, CoreWeave’s acquisitions signal the rise of the “Full-Stack Hyperscaler,” owning everything from the GPU to the MLOps layer. 131415

6. The first $100m ARR company with 30 or fewer employees is created.

Company ARR Employees at $100m ARR ARR/Employee
Cursor $100m 12 $8.3m
Midjourney $500m 100 $5.0m

For comparison, Slack had 650 employees at $100m ARR. Ramp had 275. Wiz had 400.

Score : 1.

AI-native teams have achieved unprecedented efficiency. Cursor hit $100m ARR in January 2025 with just 12 employees 16, proving that agentic software can scale with minimal headcount. Midjourney continues to defy gravity, reaching $500m ARR with a team of roughly 100. 17 These teams leverage the capital efficiency of agentic software to meet ravenous consumer & enterprise demand. 18

7. After years of declines, the US web3 engineering population grows by 25% as the government embraces crypto & web3.

Score : 1.

US Web3 jobs grew by 26% in 2025, reaching 21,600 positions. 19 The regulatory environment shifted significantly, leading to a surge in institutional adoption & a new wave of consumer applications built on decentralized stacks.

8. Data center spending by hyperscalers eclipses $125b for the year as the AI race fuels demand for GPUs. Broadcom is the hottest semiconductor stock of the year.

Score : 0.75.

Hyperscaler CapEx far exceeded expectations, reaching an estimated $315b to $350b in 2025. Amazon alone spent $100b 20, followed by Microsoft ($80b) 21 & Google ($75b) 22. Broadcom’s stock surged as it became the primary beneficiary of the AI networking buildout, outperforming even NVIDIA in the latter half of the year, but it was third in the domain next to Google & Micron. 2324

Company Ticker YTD Return (2025)
Micron MU 198%
Google GOOGL 62%
Broadcom AVGO 46%
NVIDIA NVDA 35%
Microsoft MSFT 16%

9. Stablecoin supply increases 50% to $300b as more businesses adopt this payment mechanism for B2B payments. Stablecoin volume is greater than 3x Visa’s transaction volume.

Score : 1.

Stablecoin supply hit $310b in December 2025. 25 Monthly adjusted stablecoin volume has now surpassed Visa’s network volume, with annual on-chain volume exceeding $46t—nearly 3x Visa’s transaction volume. 26 B2B adoption has accelerated as businesses seek faster, cheaper cross-border settlement.

10. Observability, SIEM, & Business Intelligence begin to use the same data lake. Usage-based pricing for many software companies creates a need for a single data lake. The data lake becomes the dominant data architecture across all workloads.

Score : 0.

While there is some convergence in the use of OpenTelemetry (OTel) for both security & observability 27, the broader vision of a single data lake for BI, SIEM, & observability has not materialized. Enterprises continue to maintain siloed architectures for performance & compliance reasons, & the cost savings from hybrid lakehouse models haven’t been enough to force a total consolidation. 2829


  1. Renaissance Capital, “2025 IPO Market Review.” ↩︎

  2. StockTitan, “CoreWeave, Circle, and Figma IPO Performance 2025.” ↩︎

  3. Tom Tunguz, “The Scaling Wall Was A Mirage,” Nov 2025. ↩︎

  4. Artificial Analysis, “Gemini 3 Flash Latency Benchmarks,” Jan 2026. ↩︎

  5. Hugging Face, “Open LLM Leaderboard v3,” Feb 2026. ↩︎

  6. LMSYS, “Video Arena Leaderboard Q1 2026.” ↩︎

  7. SQ Magazine, “OpenAI Advanced Voice Mode Engagement Statistics.” ↩︎

  8. DemandSage, “Voice Assistant Usage Statistics 2025.” ↩︎

  9. Verloop.io, “Enterprise Voice AI Adoption Trends.” ↩︎

  10. Venture Capital Journal, “AI Startup Funding 2025.” ↩︎

  11. Juniper Square, “Q1 2025 VC Deal Count Trends,” Apr 2025. ↩︎

  12. PitchBook, “US VC Fundraising Concentration H1 2025.” ↩︎

  13. AI Data Insider, “Major Data Infrastructure M&A 2025.” ↩︎

  14. Orrick, “CoreWeave Completes Acquisition of Weights & Biases,” May 2025. ↩︎

  15. Tracxn, “CoreWeave Acquires OpenPipe,” Sept 2025. ↩︎

  16. Sacra, “Cursor: The AI Code Editor Scaling to $100M ARR,” Jan 2025. ↩︎

  17. Quantumrun, “Midjourney ARR and Employee Count 2025,” Oct 2025. ↩︎

  18. SaaStr, “How Cursor Scaled to $1B ARR in 11 Months,” Nov 2025. ↩︎

  19. Coincub, “US Web3 Jobs Market Report 2025.” ↩︎

  20. DCPulse, “Hyperscaler CapEx Forecast 2025.” ↩︎

  21. Network World, “Microsoft’s $80B AI Data Center Investment,” July 2025. ↩︎

  22. Investopedia, “Alphabet’s $75B Infrastructure Plan,” Apr 2025. ↩︎

  23. Seeking Alpha, “Broadcom vs NVIDIA: The AI Networking Race,” Dec 2025. ↩︎

  24. NerdWallet, “Best Performing Semiconductor Stocks 2025,” Dec 2025. ↩︎

  25. MEXC, “Stablecoin Market Cap December 2025.” ↩︎

  26. a16z Crypto, “State of Crypto Report 2025.” ↩︎

  27. Elastic, “OpenTelemetry: The Convergence of Observability and Security,” 2025. ↩︎

  28. Market.us, “Data Lake Market Size and Lakehouse Adoption 2025.” ↩︎

  29. Cy5.io, “Security Data Lake and SIEM Convergence Costs.” ↩︎

12 Predictions for 2026

2025-12-22 08:00:00

Every year I make a list of predictions & score last year’s predictions. 2025 was a good year : I scored 7.85 out of 10. I will release the scoring tomorrow. For today, here are my predictions for 2026 :

1. Businesses pay more for AI agents than people for the first time.

This has already happened with consumers. Waymo rides cost 31% more than Uber on average, yet demand keeps growing. 1 Riders prefer the safety & reliability of autonomous vehicles. For rote business tasks, agents will command a similar premium as companies factor in onboarding, recruiting, training, & management costs.

2. 2026 becomes a record year for liquidity.

SpaceX, OpenAI, Anthropic, Stripe, & Databricks IPO, with SpaceX & OpenAI ranking among the ten largest offerings ever. The pent-up demand from 4+ years of drought finally breaks. Fear of disruption by fast-growing AI systems drives defensive acquisitions exceeding $25b as incumbents buy rather than build.

3. Vector databases resurge as essential infrastructure in the AI stack.

Multimodal models & world/state-space models demand new data architectures. Vector databases grow revenue explosively as they become the connective tissue between foundation models & enterprise data.

4. AI models execute tasks autonomously for longer than a workday.

According to METR, AI task duration doubles every 7 months. 2 Current frontier models reliably complete tasks taking people about an hour. Extrapolating this trend, by late 2026, AI agents will autonomously execute 8+ hour workstreams, fundamentally changing how companies staff projects.

5. AI budgets receive scrutiny for the first time.

Buying committees & boards push back on AI spending. Small language models & open-source alternatives rise in popularity as research labs determine how to specialize them for particular tasks, achieving state-of-the-art performance at a fraction of the cost. Developers prefer them for 10x cost reductions.

6. Google distances itself from competitors via breadth in AI.

No other company achieves breakthroughs across as many domains : frontier models, on-device inference, video generation, open-source weights, & search integration. Google sets the pace, forcing OpenAI, Anthropic, & xAI to specialize in response. The era of every lab competing on every frontier ends.

7. Agent observability becomes the most competitive layer of the inference stack.

Engineering observability, security observability, & data observability fuse into a single discipline. Agents require unified visibility across code execution, threat detection, & data lineage. This marks the beginning of the confluence I predicted in 2025 : the three observability spaces finally converge.

8. 30% of international payments are issued via stablecoin by December.

Instant settlement & cross-border payments drive massive adoption. As regulatory clarity improves in major markets, stablecoins move from the periphery of crypto to the core of global trade finance, displacing traditional SWIFT rails for a significant portion of B2B volume.

9. Agent data access patterns stress & break existing databases.

Agents issue at least an order of magnitude more queries to databases & data lakes than people ever have. This surge in concurrency & throughput requirements forces a redesign of the overall architecture for both transactional & analytical databases to handle the relentless demand of autonomous systems.

10. The data center buildout reaches 3.5% of US GDP in 2026.

The scale of investment mirrors the historical expansion of the railroads. The only factor that slows overall building is perceived risk within the credit market, particularly in the private credit market. The massive growth in that asset class suddenly shows strains of increasing default rates, creating a potential bottleneck for the most capital-intensive infrastructure projects.

11. The web flips to agent-first design.

Most developer documentation & many websites become agent-first rather than people-first. This shift occurs because many purchasing decisions are now informed first through agentic research. Consequently, the front door needs to be designed for robots, while the side door caters to people.

12. Cloudflare becomes the gatekeeper for agentic payments.

The x402 protocol revives HTTP’s long-dormant 402 “Payment Required” status code, enabling AI agents to pay for API access in real-time. 3 Cloudflare’s position as the web’s infrastructure layer makes it the natural chokepoint for this new commerce. This concentration becomes a flashpoint as a few giants push Cloudflare to be more open.

2026 is the year enterprises productionize AI.


Thursday is Podcast Day

2025-12-18 08:00:00

Thursday is the new Monday for podcast producers. Analyzing hundreds of tech & VC podcasts over six weeks this fall through MotherDuck’s MCP server, I found 42% of episodes drop mid-week, between Thursday & Friday. Sunday? A publishing desert at just 3%.

Here’s what else the data reveals.

When Episodes Drop

Prompt: Count the number of podcasts published on each day of the week with %

motherduck_podcast_dow_v2

The pattern makes sense : listeners consume podcasts during commutes & workouts early in the week, so publishing mid-week ensures fresh content when people are ready to listen.

Content Depth Varies Dramatically

Prompt: Calculate the average, minimum, & maximum word count of the podcast summaries.

motherduck_podcast_word_count_v2

The summary length is quite narrow. Between the 25th & 75th percentile, there’s only a paragraph worth of difference, suggesting the market has discovered an optimal length.

A Typical Podcast Releases At Least Three Episodes Per Month

Prompt: Show me the number of podcasts per month.

motherduck_podcasts_per_month

The median podcast releases three episodes per month. The outliers? Daily shows publishing multiple episodes per day.

If you’re thinking about launching a podcast, it’s probably worth testing releasing episodes on Sundays, perhaps early in the morning or around coffee time.

These types of analyses are becoming simpler & simpler with technologies like MCP.

A Flash of Deflation

2025-12-17 08:00:00

Gemini 3 Flash represents a step function increase in model deflation : a gauntlet thrown.

Google’s latest model underprices the state of the art by 70% to 79%, with very similar levels of performance. At $0.50 per million input tokens & $3.00 per million output tokens, Gemini 3 Flash hovers within 9% of the best scores across 20 benchmarks.1

How much better is the price-performance? How much cheaper can teams run inference?

First, performance. Gemini 3 Pro tops this analysis with a 6% deviation from state of the art. Gemini 3 Flash is not far behind at 9%, followed by Opus at 12%.

Average Percentage Delta from SOTA by Model

Second, input price-performance. Looking at overall price-performance by input tokens reveals a huge gap. Gemini 3 Flash delivers 182 performance points per dollar compared to GPT-5.2 at 53. That’s a 3.4x advantage.

Third, output price-performance. The spread widens further. Gemini 3 Flash scores 30.3 performance points per output dollar versus Claude Opus 4.5’s 3.5 points. Claude charges $25 per million output tokens : 8x more than Gemini’s $3 : while scoring 2.9% lower on aggregate benchmarks.23

Gemini 3 Flash Output Price-Performance

Gemini 3 Flash compresses what took $65 of GPT-4 tokens in March 2023 into $1.10 today for equivalent capability.4 That’s 98% deflation in 33 months. For teams building AI products, this isn’t incremental improvement : it’s a category shift that makes previously uneconomic use cases viable.

Model Performance Score Avg % From SOTA Input $/MTok Output $/MTok
Gemini 3 Flash 90.8 -9.2% $0.50 $3.00
GPT-5.2 92.6 -7.4% $1.75 $14.00
Gemini 3 Pro 93.8 -6.2% $2.00 $12.00
Claude Sonnet 4.5 71.8 -28.2% $3.00 $15.00
Claude Opus 4.5 88.2 -11.8% $5.00 $25.00

The pace of model releases in Q3 & Q4 has been relentless. The performance improvements have shattered expectations. What’s even more unbelievable : a 70% to 80% discount on that performance within weeks of release.

Google is leading the deflation in AI pricing, selling tremendous performance at going-out-of-business prices.


State of the art (SOTA) : the best score achieved by any model on a given benchmark, used as the 100% baseline. Performance score = 100 + average percentage delta from SOTA.

SOTA Delta Heatmap by Model and Benchmark


  1. Methodology : Measuring overall price-performance is hard. I used a heuristic : percentage delta from state of the art across 20 different benchmarks, run across the five most recent frontier models (Gemini 3 Flash, Gemini 3 Pro, GPT-5.2, Claude Opus 4.5, Claude Sonnet 4.5). Each model’s score on each benchmark is compared to the best score achieved by any model on that benchmark. The percentage delta is then averaged across all benchmarks to produce a composite performance score. ↩︎

  2. Claude Opus 4.5 output pricing ($25/MTok) divided by Gemini 3 Flash output pricing ($3/MTok) = 8.3x. Performance score difference : Gemini 3 Flash (90.8) - Claude Opus 4.5 (88.2) = 2.6 points, or 2.9% lower relative to Gemini 3 Flash. ↩︎

  3. Correction (Dec 17, 2025) : Updated “2.6% lower” to “2.9% lower” to reflect the correct percentage calculation (2.6 points / 90.8 = 2.86%). ↩︎

  4. Cost per unit of performance calculated as blended token price (80% input + 20% output) divided by composite benchmark score. GPT-4 (March 2023) : $36 blended / 0.55 composite = $65.45. Gemini 3 Flash (Dec 2025) : $1.00 blended / 0.908 composite = $1.10. Deflation : ($65.45 - $1.10) / $65.45 = 98.3%. ↩︎