2025-09-30 08:00:00
I discovered I was designing my AI tools backwards.
Here’s an example. This was my newsletter processing chain : reading emails, calling a newsletter processor, extracting companies, & then adding them to the CRM. This involved four different steps, costing $3.69 for every thousand newsletters processed.
Before: Newsletter Processing Chain
# Step 1: Find newsletters (separate tool)
ruby read_email.rb --from "[email protected]" --limit 5
# Output: 340 tokens of detailed email data
# Step 2: Process each newsletter (separate tool)
ruby enhanced_newsletter_processor.rb
# Output: 420 tokens per newsletter summary
# Step 3: Extract companies (separate tool)
ruby enhanced_company_extractor.rb --input newsletter_summary.txt
# Output: 280 tokens of company data
# Step 4: Add to CRM (separate tool)
ruby validate_and_add_company.rb startup.com
# Output: 190 tokens of validation results
# Total: 1,230 tokens, 4 separate tool calls, no safety checks
# Cost: $3.69 per 1,000 newsletter processing workflows
Then I created a unified newsletter tool which combined everything using the Google Agent Development Kit, Google’s framework for building production grade AI agent tools :
# Single consolidated operation
ruby unified_newsletter_tool.rb --action process \
--source "techcrunch" --format concise \
--auto-extract-companies
# Output: 85 tokens with all operations completed
# 93% token reduction, built-in safety, cached results
# Cost: $0.26 per 1,000 newsletter processing workflows
# Savings: $3.43 per 1,000 workflows (93% cost reduction)
Why is the unified newsletter tool more complicated?
It includes multiple actions in a single interface (process, search, extract, validate), implements state management that tracks usage patterns & caches results, has rate limiting built in, & produces structured JSON outputs with metadata instead of plain text.
But here’s the counterintuitive part : despite being more complex internally, the unified tool is simpler for the LLM to use because it provides consistent, structured outputs that are easier to parse, even though those outputs are longer.
To understand the impact, we ran tests of 30 iterations per test scenario. The results show the impact of the new architecture :
Metric | Before | After | Improvement |
---|---|---|---|
LLM Tokens per Op | 112.4 | 66.1 | 41.2% reduction |
Cost per 1K Ops | $1.642 | $0.957 | 41.7% savings |
Success Rate | 87% | 94% | 8% improvement |
Tools per Workflow | 3-5 | 1 | 70% reduction |
Cache Hit Rate | 0% | 30% | Performance boost |
Error Recovery | Manual | Automatic | Better UX |
We were able to reduce tokens by 41% (p=0.01, statistically significant), which translated linearly into cost savings. The success rate improved by 8% (p=0.03), & we were able to hit the cache 30% of the time, which is another cost savings.
While individual tools produced shorter, “cleaner” responses, they forced the LLM to work harder parsing inconsistent formats. Structured, comprehensive outputs from unified tools enabled more efficient LLM processing, despite being longer.
My workflow relied on dozens of specialized Ruby tools for email, research, & task management. Each tool had its own interface, error handling, & output format. By rolling them up into meta tools, the ultimate performance is better, & there’s tremendous cost savings. You can find the complete architecture on GitHub.
2025-09-29 08:00:00
“The way to do a piece of writing is three or four times over, never once.”
Writing is hard. John McPhee, who invented literary nonfiction that reads like a novel, developed a four-draft writing method that transforms chaotic ideas into compelling narratives.
McPhee pioneered creative nonfiction at The New Yorker, writing books like Oranges & Coming into the Country that made complex subjects fascinating through storytelling. His approach differs from traditional journalism by incorporating fiction techniques while maintaining factual accuracy. His prose combines vivid imagery with economy :
“The doctor listens in with a stethoscope and hears sounds of a warpath Indian drum.”
He favored directness :
“He liked to go from A to B without inventing letters between.”
About his genre, McPhee said :
“Nonfiction—what the hell, that just says, this is nongrapefruit we’re having this morning.”
McPhee later codified his approach in Draft No. 4: On the Writing Process, sharing decades of writing wisdom.
His organizational philosophy shapes everything :
“You can build a structure in such a way that it causes people to want to keep turning pages. A compelling structure in nonfiction can have an attracting effect analogous to a story line in fiction. Readers are not supposed to notice the structure. It is meant to be about as visible as someone’s bones.”
McPhee’s Four-Draft Framework :
This is one of the best techniques I’ve found for writing. The method works because it separates creative thinking from critical evaluation. When you try to write perfect prose while generating ideas, it’s easy to fall into creative block.
Each draft becomes the foundation for the next, creating a recursive process that transforms chaotic thoughts into structured narratives. Like peeling back the layers of an orange to reveal the fruit within, each draft strips away what doesn’t belong, revealing the essential story that was always there waiting to be discovered.
2025-09-26 08:00:00
Every portfolio manager knows the efficient frontier - the set of optimal portfolios offering maximum returns for given risk levels. What if AI prompts had their own efficient frontier?
As we all start to use AI, prompt optimization will be a consistent challenge. GEPA, GEnerative PAreto, is a technique to discover the equivalent efficient frontier for AI.
Reading the paper, I noticed the initial results were promising, with a 10-point improvement on certain benchmarks & a 9.2 times shorter prompt length. Shorter prompt length, & we all know that input prompts are the biggest driver of cost (see The Hungry, Hungry AI Model). So, I implemented GEPA in EvoBlog.
To use GEPA, we must identify the scoring axes that an LLM uses to score a post. Here are mine :
Evaluation Axis | Weight | Description |
---|---|---|
Style Match | 25% | How well the post matches Tom Tunguz’s distinctive writing style |
Argument Quality | 20% | Strength and logic of the arguments presented |
Data Usage | 15% | Effective use of statistics, examples, and quantified metrics |
Readability | 15% | Clarity, sentence structure, and ease of reading |
Originality | 15% | Fresh perspectives, novel connections, avoiding clichés |
Engagement | 10% | Hooks, emotional language, reader involvement |
Now that we have this framework, we can enter a prompt to generate a blog post & have the EvoBlog system iterate through different prompts to meet the efficient frontier for each dimension, weighted across all variables—not just one.
Here are the scores for two hypothetical blog posts. You can see one spikes more on style, while the other one focuses on data usage. Using GEPA, we can determine which is the better all-around post. In this case, it is the data-focused post.
All of this to say, dear reader, that I’ve only ever published one blog post fully generated by AI.
My goal with these automated systems is to learn how they work, how to tune them, & generate initial drafts that approximate my first & second drafts. I will always be completing drafts three & four.
The efficient frontier is no substitute for insight & an authentic voice.
2025-09-25 08:00:00
How long & how quickly can a business compound ?
This is a question every investor asks of every business, public or private.
In the 2010s, Slack & Atlassian became titans. On the day Salesforce announced its intent to acquire Slack, it was equally valuable to Atlassian at ~$27b.
The revenue curves look similar in the out years, similar growth rates. Atlassian continues to compound at massive scale.
But the time to achieve $1b from founding date differs by a decade : 17 vs 7 years.
To create value, a startup must grow quickly & grow at scale ; or grow consistently over a long period of time. AI companies today are growing very quickly. The T3D2 companies can grow at a slower rate over a longer period of time to achieve the same market cap.
Compare OpenAI’s 400% growth at $1b revenue to Atlassian’s 30%. Or Snowflake at 124%. Snowflake is $75b market cap today, Atlassian $42b. The advantage of a head of steam is clear.
While both paths steady compounding & hypergrowth can lead to the same destination, the latter creates more value because of the time value of money. The sooner a startup reaches $1b revenue, the more valuable it is.
Of course, a hypergrowth company with significant churn isn’t worth very much at all. The CAP theorem equivalent in business is some combination of growth, margin, & retention. Most businesses can’t optimize for all three.
2025-09-23 08:00:00
OpenAI hit $12 billion ARR within five years of ChatGPT’s launch [1] . Anthropic reached $200 million in revenue in January 2024 [2] . Meanwhile, Salesforce took ten years to reach $1 billion ARR [3] .
Does this mean the T3D2 framework (triple-triple-triple-double-double ARR to go public), originally outlined by Neeraj Agrawal, which provides a clear path to IPO-scale revenue is dead?
There’s no doubt that AI companies have grown at unprecedented rates. If we understand these fundamental drivers, we can better assess how sustainable this growth is.
Management teams & boards are insisting on AI transformation. ROI is still early. This urgency is captured by Larry Page’s recent quote: “I am willing to go bankrupt rather than lose this race.” The same mentality drives aggressive experimentation across multiple vendors & rapid buying decisions, from hyperscalers to mid-market businesses. Will the end of this era lead to churn?
AI can automate labor. The overall cost savings can be significantly greater than workflow optimization tools of the previous era. As a result, the contract sizes for many AI products are significantly larger to start. If this continues, the overall bookings model & AE productivity model also need to change. If AI can deliver on the promise, larger contract sizes may be the norm.
Incumbents are defensive. The prizes in AI are huge. The initial curiosity around AI & the massive growth rates of some companies has led some incumbents to turn defensive, blocking access to their data. More defensibility might also curtail on-platform growth.
The more sustainable AI’s growth rates & customer retention are, the more challenging T3D2 advice remains because the market’s expectations of growth will change. But for now, it’s too soon to tell whether the sizes of the contracts and durability are long-term characteristics of this market.
For businesses currently on the T3D2 plan, the fundraising market may be a bit more challenging because of the comparison to AI growth rates. Also the expectations around size at IPO have increased.
$100m in trailing revenue growing 50% used to be the target. But now the expectation is closer to $300m growing at 50%. Attaining those numbers requires sustaining high growth rates for longer.
2025-09-19 08:00:00
We build teams in pyramids today. One leader, several managers, many individual contributors.
In the AI world, what team configuration makes the most sense? Here are some alternatives :
First, the short pyramid. Managers become agent managers. The work performed by individual contributors of yore becomes the workloads of agents. Everyone moves up a level of abstraction in work.
This configuration reduces headcount by 85% (1:7:49 -> 1:7). The manager to individual contributor ratio goes from 1:7 to 1:1. The manager to agent ratio remains 1:7.
Second, the rocket ship 🚀!
One director, seven managers, 21 employees. Everyone in the organization is managing agents, but these agents reflect their seniority. The director manages an AI chief-of-staff, the managers are player-coaches, both executing goals themselves & training/coaching others on how to manipulate AI successfully, which cuts the span of control by half.
This configuration reduces headcount (1:7:49 -> 1:7:14) by 53%.
The future is not one-size-fits-all.
Here’s the twist: not every department in a company will adopt the same organizational structure. AI’s impact varies dramatically by function, creating a world where the shape of a company becomes more nuanced than ever.
Sales teams will likely maintain traditional pyramids or rocket ships. Relationships drive revenue, & human empathy, creativity, & negotiation skills remain irreplaceable. The classic span of control models still apply when trust & rapport are paramount.
R&D teams present the greatest opportunity for the short pyramid transformation. Code generation is AI’s first true product-market fit, generating 50-80% of code for leading companies.
Customer success & support might evolve into hybrid models: AI handles routine inquiries while humans manage complex escalations & strategic accounts. The traditional middle management layer transforms into something entirely new.
This evolution challenges everything we know about scaling teams effectively. The old wisdom of 6-7 direct reports breaks down when managers oversee both human reports & AI agents.
The recruiting burden that historically justified management hierarchies transforms too. Instead of finding & developing human talent, managers increasingly focus on configuring AI capabilities & optimizing human-AI collaboration.
If the company ships its org chart, what org chart do you envision for your team?