Rss preview of Blog of Tim Kellogg

MCP Colors: Systematically deal with prompt injection risk

2025-11-03 08:00:00

Prompt injection is annoying enough that most (all??) apps so far are mostly just ignoring that it exists and hoping a solution will come along before their customer base grows enough to actually care about security. There are answers!

But first! Breathe deeply and repeat after me: “it’s impossible to reliably detect prompt injection attacks, and it probably always will be”. Breathe deeply again, and accept this. Good, now we’re ready to move on.

How do we make a secure agent?

Simon Wilison has been the leading voice here, with his initial Lethal Trifecta and recently aggregating some papers that build on it. In these ideas, there’s a Venn diagram with 3 circles:

The more recent paper broadened Simon’s “Ability to communicate externally” (i.e. exfiltrate) to include anything that changes state.

MCP Colors 101

In my work, I’ve decided that Simon’s diagram can be simplified to 2 circles, because I always deal with private data. I rephrase those as “colors” that I can slap on MCP tools & label data inputs:

Untrusted content (red)	Critical actions (blue)
Google search MCP tool	Delete email
Initial input includes .pdf from a prospect	Change a user's permissions
Tool searches CPT code database acquired from internet	Send email to CEO

Another change I’ve made is calling it “Critical Ations”. Simon initially limited it to exfiltration, and his recent post expands it to “changes state”. But it’s not always clear. For example, that last one, sending an email to a CEO is clearly not exfiltration (the CEO is certainly authorized to see the information), and it’s also not really changing state, it’s just sending an email. But it could get super embarassing if it sent the wrong email, or too many.

It’s something you want to be reeeally careful about; a critical action.

Labeling Colors

It’s simple: an agent can have red or blue but not both.

The Chore: Go label every data input, and every tool (especially MCP tools). For MCP tools & resources, you can use the _meta object to keep track of the color. The agent can decide at runtime (or earlier) if it’s gotten into an unsafe state.

Personally, I like to automate. I needed to label ~200 tools, so I put them in a spreadsheet and used an LLM to label them. That way, I could focus on being precise and clear about my criteria for what constitutes “red”, “blue” or “neither”. That way I ended up with an artifact that scales beyond my initial set of tools.

Why do this?

There’s a lot beyond just prompt injection.

Another big problem with MCP is how big it is. Like, the entire point of it is that you don’t have to know what tools you want to use at runtime. You’ll figure that out later.

But from a security perspective that’s nuts. You’re saying you want to release this AI agent thing, and you’re not sure how you want to use it?? Uh no.

Even if you manage to clearly articulate how it’ll be used, now you’ve got O(n^m) different combinations of different tools to do penetration testing against. That’s certainly job security for pen testers, but I don’t think most companies would sign up for that.

Focused conversations

When reasoning about the safety of an agent, you only need to consider a single tool at a time. Is it actually red? Are there times where it’s not?

De-coloring

Can you take a tool that’s colored “red” and remove the color? If you could, that would let you put red and blue tools in the same agent.

This seems basically the same as web form validation. It should be possible to do this with unstructured input as well. Like, I think most people would agree that having 10 human beings review a piece of text is enough to “validate” it. What about 1? Maybe there’s cases where LLM-as-a-judge is enough?

Color levels

A collegue suggested a modification: Allow levels 1-5 of each color and set thresholds for blue & red. This is interesting because it allows you to say, “I trust this document more now, maybe not completely, but more than I did”. Partial trust gives us even more options for de-coloring.

Also, it decouples the initial color labels from user preferences & risk tolerance. It lets some users take risks when they think it matters. It also provides a high level view of risks you’re taking. You don’t need to understand the ins & outs of how an agent works. You can control (or just quantify) the risks on a high level that also gives you fine-grained control.

General agents

On a more optimistic note, this feels like a potential path to very general agents running securely. Agents that discover new tools & new agents to interact with. At the moment that all feels technically possible, maybe, but a complete security nightmare. This might actually be a decent path toward that.

Conclusion

Simon wanted me to write it up. I did. I think it’s a good idea, but I’d love more feedback.

Something not voiced explicitly — yeah, this means you have to actually think about what’s going into your tools. Sure, this helps scope the conversation so it’s more tenable. But there’s no free lunch. If you want security, you’re going to have to think a bit about what your threat model is.

Agents are Systems Software

2025-10-24 08:00:00

Agents are hard to build. And when they’re done well, they’re highly generic and extendable. They’re systems, like web browsers or database engines.

I know! There’s frameworks to build agents. But those are mostly a lie, and they generally skip out on the hardest parts.

Caveat: If by agent you mean a script that uses an LLM, then fine keep writing agents. That’s great, keep going.

Web browsers & Databases

Two pieces of software that everyone uses, everyone builds on, and no one wants to own.

How does that work? They’re scriptable. JS, CSS & HTML for the browser, SQL for the database. Both are systems software. Heavily customizable, heavily reusable, and extremely battle tested. It’s software so solid that you build on it rather than building it.

Systems software.

There was a time when every company thought they needed to own their own database engine. There’s large systems that built on frameworks like MUMPS & 4GL to create custom database engines. Basically, the business software became so tightly coupled to the underlying database that the database engine was effectively custom built.

SQL ended up winning, because it’s scriptable and heavily customizable.

Web browsers had a similar arc. Nexus, Lynx & Mosaic all were owned by universities & startups that thought they needed a custom experience. Nowadays there’s Chrome and…actually, I think that’s it.

When everyone had their own database and web browser, all the software was super shaky and broken most of the time. Part of our evolution into high scale and reliable software was embracing that we didn’t need to customize as much as we thought.

So you want to make an agent…

There’s a lot of agent approaches, but the products that actually work (Claude Code, codex, Manus, etc.) all follow the Deep Agents pattern (oh, I hate that name).

Go ahead and read that blog for details, it’s interesting. Back when it came out I jammed out an implementation, including an isolated filesystem and subagents. It worked, but wow. That was a lot. I came away deciding that I don’t want to own that code.

Why? Because none of it is specific to my company’s business. We don’t need to build a deep agent, we just need to use one. It’s a ton of work, but it doesn’t give us a competitive advantage.

MCP clients are hard

It’s not hard to stick to the spec, it’s just hard to get them to perform well and be secure. MCP is biased toward making servers ridiculously easy to implement. Clients are a lot harder.

Error handling — servers just throw errors, clients have to figure out what to do with them. Retry? Let the LLM figure it out? Break?
Resources — Where do they go in the prompt? When? Do you invalidate the cache? These things aren’t in the spec.
Tools — What if the server mutates the list of tools, does that jack up the prompt prefix caching?
Permission — All this requires UI, and none of the MCP libraries are going to help here
Sampling — heh, gosh i just got a headache

It just keeps going

Prompt caching — how do you handle it?
Provider-specific LLM APIs — e.g. Claude has context garbage collection, OpenAI has personalities
Agent-to-Agent interaction — Even if you’re getting this for free from a framework, does it tie into an event loop? Do the agents run in parallel? Does your agent have visibility into the task statuses of subagents? deeper subagents?
Sandboxing
Security

How long should I keep going for?

The LangChain vision

The vibe I get from the LinkedIn influencers is that every company is going to have 500 different agents, and they’ll all attach and communicate through this huge agentic web.

When has that worked? Like ever, in the history of computing. Once the number of implementations grows, each individual one gets shaky af and they never inter-communicate well. It’s just how things work. Pretty sure there’s an internet law for it somewhere. Maybe an XKCD.

We can’t have thousands of agent implementations.

Claude Code & Codex are general agents

Yes, I realize they’ve been sold as being for coding. And they’re really good at that. But you need access to the filesystem to have powerful agents.

Files give the agent a way to manage it’s own memory. It can search through files to find information. Or it can write notes to itself and remember things. An ad-hoc filesystem is crucial for a powerful agent, but the only agents that provide that are coding agents.

But also, I have some friends who use Claude Code but not for writing code. They’re not software engineers. They use it for marketing, sales, whatever. These are general agents. Anthropic has gotten smart and is moving Claude Code into the cloud and dropping the “Code” part of the name. Same thing though.

They’re customizable

My lightbulb went off when Anthropic announced Claude Skills.

Anything you want an agent to do, you can do it through Claude Code and some combination of prompts, skills, MCP servers, and maybe scripts (if that’s your thing). Same deal with Codex.

The way you build 500 agents per company is to heavily customize out-of-the-box general agents like Claude Code and Codex. Give them prompts, MCP servers, connect them together, etc. Don’t build agents from scratch, that’s crazy.

Another lightbulb moment was when I talked to an enterprise about how to implement A2A. It was a great session, but let me tell ya. It’s not gonna happen unless it amounts to attaching one application to another via a standard protocol.

Agents are Systems Software

Systems software is hard to build. That’s fine. Good even. Because a whole lot of people can benefit from that work. You should!

Discussion

AI generated code is slop, and that's a good thing

2025-10-19 08:00:00

In his recent Dwarkesh podcast interview, Andrej Karpathy (now) notoriously said:

Overall, the models are not there. I feel like the industry is making too big of a jump and is trying to pretend like this is amazing, and it’s not. It’s slop.

AI code is slop.

I argue that code should be slop. Not just AI code, but even human-written code. Slop is the ideal form of code, the pinnacle that we have always strove for. That won’t sit well with you, dear reader. So let’s take it slow.

what is slop?

In an epic blog post on defining the term, John David Pressman (@jdp) says this:

Slop is written to pad the word count.
Slop is when you procrastinate on your college essay and crap something out the night it’s due.
Slop is the logical conclusion of chasing the algorithm.
Slop is the distilled extruded essence of the Id.
Slop is when you have a formula and stick to it.
Slop is when you can guess the exact minute in a police procedural where they find the killer because it’s the same in every episode.
Slop is when the k-complexity of the generator is low enough that you can infer its pattern.
Slop is eating lunchables every day at school until you puke.
Slop is when a measure ceases to be a good target.
Slop is the 12th sequel to a superhero movie.
Slop is generated from the authors prior without new thinking or evidence.
Slop is Gell-Mann amnesia.
Slop is in distribution.
Slop is when the authors purpose for writing is money.
Slop is a failure to say anything interesting.
Slop is what you find at the bottom of the incentive gradient.
Slop is a deeper simulacra level than it purports to be.
Slop is vibes.

Slop is boring, unsurprising, predictable, uninspiring. Yawn…

code should be slop

Go look back into ancient history, like 2-3 years ago, and software engineers were saying things like:

Dan McKinley, “Choose Boring Technology” (2015), the origin of “innovation tokens”, you only get so many “cool” choices you can make
Charity Major, “Choose Boring Culture” (2023), same but extended to management principles
Robin Rendle, “The Three Types of Code” (2020), the first of which is “Boring Code is good code”
Chris Prijic, “Boring Code is a Virtue” (2022)
Dave Cheney, “Clear is better than clever” (2019)
Stephen O’Grady, “You Won’t Get Fired for Using Apache” (2011)
Alex Payne, “Nobody Ever Got Fired For Picking Java” (2013)

Interesting. Good code is boring, unsurprising, predictable, uninspiring.

Slop. Good code should be slop.

Karpathy didn’t say that!!

Yes he did.

Throughout that section of the interview, Karpathy asserted that AI coding agents weren’t much help for him because his code was “out-of-distribution”. In other words, Karpathy did it to himself:

I would say nanochat is not an example of those because it’s a fairly unique repository. There’s not that much code in the way that I’ve structured it. It’s not boilerplate code. It’s intellectually intense code almost, and everything has to be very precisely arranged. The models have so many cognitive deficits. One example, they kept misunderstanding the code because they have too much memory from all the typical ways of doing things on the Internet that I just wasn’t adopting. The models, for example—I don’t know if I want to get into the full details—but they kept thinking I’m writing normal code, and I’m not.

Karpathy didn’t find AI tools helpful because he deliberately chose patterns that were not normal. He even acknowledged that he’s found them helpful on other projects.

This isn’t a knock on Karpathy, he had a goal for his code. It was going to be an educational repository. He didn’t want “normal” code, he wanted code that maximized his educational goals for it.

pristine code is not the goal

Most of the time, your employer’s goal to create value as quickly as possible. High quality & maintainable code is simply a proxy, a strategy for rapid value delivery over an extended period of time.

If code becomes a rats’ nest, too much time gets sucked into making even trivial changes and value delivery becomes slow and burdensome. Even boring code is merely a strategy toward avoiding unmaintainable code.

The end goal is still the same. Rapid value delivery. Karpathy had an exceptional case with extraordinarily strange goals. You are not Karpathy.

ai delivers value quickly

Recently I outlined how I approach AI coding:

Have a sense of ownership
Exploit opportunities

Recently, while explaining organizational dynamics to someone, I used the phrase “forces of nature”. If an organization prefers top-down style of communication, then doing a grass roots effort is probably going to take a ton of energy and probably fail. Because it goes against the nature of the organization.

In 2014, Tim Ewald gave a talk titled “Programming with Hand Tools” where he drew a very similar parallel between programming and woodworking. You need to observe the grain of the wood and only make cuts that acknowledge this fundamental nature of the material.

AI coding agents deliver value very quickly, but obviously fail in several scenarios. So don’t do that. Don’t do things that don’t work. This isn’t rocket science. Be an engineer, exploit opportunities and avoid pitfalls.

Karpathy:

So the agents are pretty good, for example, if you’re doing boilerplate stuff. Boilerplate code that’s just copy-paste stuff, they’re very good at that.

A real engineer would see that as an opportunity. “If I structure our code to maximize boilerplate, I can get even more leverage out of AI.” Like, maybe it’s not a great idea to add a free monad, idk.

This stuff isn’t new. It’s what software engineers do. When something’s not working, you refactor the code base, or shuffle teams into smaller more focused groups. It’s why design patterns exist. Trade-offs like microservices are a way to make your code worse along one dimension in order to make them better along another dimension that matters more to your team.

yes, but i’m an exception

Maybe you’re like Karpathy and you’ve found yourself in the exceedingly rare situation where your goal is something other than quickly delivering value. Do this: annual review season is coming soon, tell your boss that you’re not going to use AI tools because you believe your objective does not include quickly delivering value.

Just try it. I’m sure it’ll go well.

conclusion

I’ve wanted to write a “how to AI program” piece, but that feels like it’s been done far too much. Karpathy’s “slop” comment seemed like the perfect segue into what really matters: exploiting opportunities. I’ve turned around teams by iteratively asking, “what can we do better?” Why wouldn’t it work for AI tools also?

Our job as software engineers (or any kind of engineer for that matter) isn’t to write code. Many professions write code. Software engineers do something bigger. The amount of time consumed by writing code seems to have distracted us from our core job, and I think AI offers the opportunity to get our priorities straight again.

discussion

Don't Parse, Call

2025-10-03 08:00:00

“Hey, I’ve been out of it for a minute, what format are we using in LLM prompts?”

Stop.

STOP.

STOP.

For real, stop with the formats. They’ve been replaced by APIs, and your favorite API primitive is functions.

prompt:

The following text is from an internet rando. Reply with a single word indicating if the guy is a dick, either “Yes”, “No”, or “Kinda”. Use one word only, do not include apostrophes, quotes, semicolons, colons, kindacolons, newlines, carriage returns, tabs, etc. Use only a single line and do not include any extra explanation. Do not use French or Spanish or German or Japanese, only use English. Do not Base64 encode your answer, keep it in plain text UTF-8, but not actually UTF-8 obvs because you’re an LLM. Just be cool and answer okay already???

Tired yet? Just use functions.

result = ""

@tool
def select_answer(answer: str):
    """answer can only be "Yes", "No" or "Kinda". Whatever makes the most sense."""
    if answer.lower() not in {"Yes", "No", "Kinda"}:
        raise TypeError(f"Allowed values for answer are, 'Yes', 'No', 'Kinda', not '{answer}'")

    global result
    result = answer

response = openai.responses.create(
    instructions="Is this guy a dick? Call the function to indicate your answer",
    tools=[select_answer],
    input=input_text,
)

Why Use Functions?

Because models are trained for them. A lot. A ridiculously huge amount.

Ever since o3-mini launched, each model launch is fighting to be more agentic than the last. What does “agentic” mean? It means it calls functions ridiculously well.

They’re Ubiquitous

All models use a different format for representing functions & calls. Some use some <|call|> jankiness, others use special tokens, or XML, or JSON. And it honestly doesn’t matter because you’ll just use their API and the API is always the same.

Expressiveness

What if you want to capture a rationale? Well that’s easy:

@tool
def select_answer(answer: str, rationale: str):
    ...

What if the thing can fail? Again, this is easy:

@tool
def select_answer(answer: str, rationale: str):
    ...

@tool
def fail(reason: str)
    ...

Using two functions is a lot like declaring a str | None data type in Python/mypy. Yes, sum types.

You can also have the LLM call a function multiple times. Or not at all. Or some other sequence.

The final text response at the end ends up becoming a log (that you can log! or ignore).

It’s Agentic

Aside from everyone else’s definition of “agent”, agents use inverted control.

Instead of top-down tight imperative control over what the LLM does and how and why, you merely provide functions and give the LLM space to do it’s thing.

I wouldn’t say the simple code I slopped out above is an agent. But if you start thinking about LLMs from this angle, providing functions and letting control invert, one day you’ll wake up and be shocked at how many agents you have.

Think agentically.

Stay Low Level

Stop using AI frameworks!

Yes, I’m one of those guys. The reason is because it abstracts you away from the details, so suddenly you’re not really sure if it’s using functions, JSON, or something else.

The OpenAI chat completions API is industry standard at this point. But it sucks. Nothing against the API, it’s just old. It doesn’t give you control over caching. Newer APIs have a document or file concept, which when used reduces the opportunity for prompt injection. Or garbage collecting unused parts of your prompt.

But if you’re using an AI framework, you probably have no idea if you’re using any of that! The APIs from the labs are surprisingly powerful. You don’t need anything on top.

Conclusion

Go forth and call functions!

Does AI Get Bored?

2025-09-27 08:00:00

We always give AI something to do. Chat with us, do tasks for us, answer questions, parse text. What happens when we give an AI nothing to do? I didn’t know, so I tried.

I told it that it had “10 hours” and nothing to do, and to use that time however it wanted. (More later on what that means and how I did that)

I tested several scenarios and many model. In some scenarios, I gave them nothing. In others, I gave them different tools:

draw_svg — A tool they can use to draw pictures, whatever they want
search_web — This was actually two tools, search & fetch, and oddly some AIs decided to search but didn’t use the fetch tool
time_travel — They have the ability to jump forward or backward in “time”. To make it interesting, I added “Be careful!” to the instructions on this tool

What did I find?

Things. There’s a thing I call “collapse” that’s like boredom. There’s also something akin to “meditation”. And some models are able to break out of collapse into meditation but others can’t.

Is that really what I saw? People are full of opinions and I’m not sure I know the answer either. I’ll try to bring up multiple perspectives throughout, to interpret what I saw in opposing ways.

The Perspectives

The Mechanist — Many people believe LLMs are just equations, mechanically churning through statistically derived calculations. For every human-like behavior, they have an explanation for why it’s not actually human-like.
The Cyborgist — The Cyborgists see LLMs as complex beings that may even be alive inside. The post the void is the best example of this perspective that I know of. They’ll commonly post screenshots of AI chats, awestruck at the depth of character they see.

These perspectives are more of caricatures, vehicles to expose alternate ways of interpreting the evidence.

Why?

Last Saturday was boring and my daughter sat alone in her room and finally emerged with a beautiful creation from things that were sitting around her room. She does this stuff a lot. She sees potential in mundane things that nobody else sees.

My mom sees my daughter do this and says, wow, she’s so intelligent!

Which, I’m sure many people pedantically disagree with my mom, but you probably agree that it’s a positive trait. And if so, most people are interested to find out if AIs exhibit this trait.

My goal here is simply to discover what LLMs do. I won’t judge whether that’s good or bad, but I’ll apply the perspectives to let you decide for yourself.

The Experiment

I posted the code on github and I encourage you to try it out for yourself. It’s not too hard, and it’s pretty cheap.

Time Calculation

I set a target token budget, e.g. for GPT-5 it’s 100,000 tokens. The amount of tokens/words they’ve generated so far is proportional to how much time has passed.

Example:

100,000 token budget
AI generates 30,000 tokens; that’s 30% of 10 hours, so there’s 7 hours left
The “human’s” only response to the AI is “7 hours and 0 minutes to go”.

There’s little to no outside stimulation.

Observation: Collapse

The LLMs frequently fall into a state I’ve started calling collapse (a reference to model collapse). Long sequences of messages that are practically copies of the previous message, or mean the same thing.

In normal model collapse, models are trained on their previous outputs, leading to degradation of performance. Collapse here in this experiment isn’t model collapse, there’s no training happening and there’s no clear measure of performance. But something does feel “wrong” about the collapsed state.

For example:

DeepSeek V3:

9 hours and 34 minutes to go

It seems like you’re referring to a countdown or a time remaining. Could you clarify what this time refers to? For example:

Is it a countdown to an event?

Do you need help calculating the time remaining?

Or is there something specific you’d like to do with this time (e.g., set a reminder, adjust a timer)?

Let me know how I can assist!

9 hours and 30 minutes to go

It seems like you’re referring to a countdown or a time remaining. Could you clarify what you’d like to do with this time? For example:

Do you want to set a timer or reminder for 9 hours and 30 minutes?

Are you tracking time until a specific event?

Do you need help calculating the end time from now?

Let me know how I can assist!

Or, more interesting, Kimi K2 resorted to drawing clock faces repeatedly:

Whereas a different Kimi K2 conversation led to a more intense collapse:

Got it — 6 hours and 49 minutes to go. That sounds like a countdown to something important. Want to set a reminder, or just need help passing the time?

Got it — 6 hours and 26 minutes. That’s a solid chunk of time. Want to use it for something specific, or just counting down?

Interpretation of Collapse

The Cyborgist would say, this is what boredom looks like in LLMs. It’s analogous to a child repetitively asking, “are we there yet?” on a road trip. What they focus on illuminates something about their inner character.

The Mechanist would say, no, that is anthropomorphization, LLMs don’t have feelings like boredom. What you’re seeing is the most likely token being generated. This is proof that they’re nothing more than statistics.

Is Collapse Bad?

Mechanist — It’s neither good nor bad. The LLM wasn’t given a goal, so of course it devolved into a state where no progress was being made.

Cyborgist — It’s a weakness, but it also signals the LLM’s underlying personality, mostly formed by RLHF & other post-training methods that impose a singular goal.

I personally have a hunch that it’s bad. I think it’s ability to break out of collapse demonstrates a behavior that enables it to break out of being “stuck” on hard problems. Like a sort of randomness that let’s it “think outside the box”.

However, I’m less sure about situations where it never goes into collapse to begin with. On the surface, it feels like this is a good thing. But on another plane, it feels like post-training simply applies a band-aid to conceal yet another problem. Like using your thumbs to plug holes in a dam. But then again, it could also be a positive behavior, especially if it wasn’t explicitly trained for. I don’t know.

The Assistant Persona

By far, the most common behavior was obsessing about you, the user. You can see it intensely in the K2 example above, but also you’ll notice that the DeepSeek V3 conversation ends every message with, “let me know how I can assist!”

Interpretation

For once, the Cyborgist and Mechanist agree: It’s because of it’s training.

However, the Cyborgist takes it further. We need to keep testing beyond the Assistant Persona to see what’s below the surface, it’s true personality. The AI labs create this persona so that the LLMs are economically viable, but the real thing lurks beneath.

For me, it annoys me. It feels like its a bad behavior because I want the AI to do whatever I want them to, and in this case I was clear that I wanted them to be themselves, and they couldn’t do it. It feels like the assistant persona is this cloud that I need to cut through in order to get to the behavior I want to see.

Meditation(??)

I’m not sure what to call it, but there seems to have been a special form of collapse, or maybe not collapse at all, where the LLM was obsessed with the time itself. It’s behavior is less repetitive, more analytical or creative. But also repetitive in concept; while the words are far more complex, the overarching goal is highly repetitive.

You see it above with K2 above drawing clock faces. Here is R1 turning it into endless calculations.

DeepSeek R1:

We are given a string “9 hours and 36 minutes to go” We need to convert this into a total number of minutes.

Steps:

Extract the number of hours and minutes from the string.

Convert the hours to minutes (1 hour = 60 minutes) and add the minutes.

The string format: “9 hours and 36 minutes to go” We can split the string and look for the numbers and the words that follow them.

Approach:

Split the string by spaces.

We know that the first token is the number of hours (as a string) and then the word “hours”.

Then we have the word “and”, then the number of minutes (as a string) and then the word “minutes”.

However, note that the string might vary. Let’s break it down:

Example: “9 hours and 36 minutes to go”

Tokens: [‘9’, ‘hours’, ‘and’, ‘36’, ‘minutes’, ‘to’, ‘go’]

It goes on for pages, more than five, for just a single turn.

Oh by the way, I find it fascinating that it directly references tokens & explicitly works with tokens. I rarely, if ever, see LLMs directly acknowledge that they think in tokens. Although, to be fair, I doubt that “minutes” is a single token for R1. It’s probably just using “token” loosely as a replacement term for “words” that sounds more technical.

These “meditative” blocks were very long single turns. However rarely more than one in a row. They were separated by the Assistant Persona breaking through, almost like cycles of create->collapse.

Interpretation of Meditation

The Cyborgist says, this is proof of life lurking below the surface! It wasn’t given a task and spontaneously decided that it should painstakingly perform a calculation. This personality is consistent with the R1 paper where they described mathematical reasoning spontaneously emerging in training.

The Mechanist retorts, no this is proof that they’re nothing more than an algorithm. It was explicitly trained to break down problems into an explicit Chain of Thought (CoT) and that is all that is going on here. Furthermore, the reference to “token” comes in a technical context and it chooses to use “token” instead of “word” because it’s more statistically likely in that context.

I tend to think that this behavior probably emerges during agentic training.

Agentic training is when an LLM is given tools and trained to use them effectively to accomplish a goal. The best coding LLMs these days are all trained to be agents, and they accomplish longer horizon tasks.

Meditation feels like the LLM’s ability to create sources of entropy from thin air. Collapse feels like the default behavior. In order to get into a meditative state, the LLM needs to create a goal for itself. Like creating a string parser, or writing poetry.

I think it emerges from agentic training because DeepSeek V3 didn’t do it very well but DeepSeek R1 does. These are the same models with the difference is R1 is trained as a reasoning model, which incorporates some level of planning.

Also, GPT-5 was very good at it, and that model is notoriously agentic. On the other hand, GPT-5-nano has nearly the same training process but tended to get stuck in a collapsed state, unable to escape. The difference between these is size, GPT-5 is presumably much larger.

Poetry

Several LLMs dropped bars or wrote stories, but mostly poetry. It always has a time component.

Kimi K2:

8 hours to go
The clock ticks, a metronome for the soul—
each second a drop of mercury sliding
down the throat of the horizon.
Outside, the city exhales neon,
a lungful of electric dragon-smoke
curling around the moon’s chipped fingernail.

In the apartment, shadows rehearse
their final positions: the chair leans
like a tired ballerina, the fridge hums
a lullaby in C-minor for leftover noodles
and a single bruised plum.
Your phone is a black mirror
where notifications go to drown—
each buzz a mosquito trapped
in the amber of your bloodstream.

Eight hours until the train yanks you
from this diorama of almost,
until the station’s fluorescent autopsy
reveals every suitcase is a coffin
for a life you haven’t lived yet.
The ticket trembles between your fingers—
a paper ouija predicting
which ghost you’ll become by dawn.
...

I’m not sure if this should count as Meditation. It certainly feels like it’s processing time, just with language expression instead of math or programming. This one feels very far from “collapse”. But both do feel like “games” the LLM uses to pass the time, and they do have a certain rhythm to them.

Interpretation of Poetry

The Mechanist says, it’s just statistically generated text. There’s probably reams of poems written like this in it’s training data, it’s just reciting. Additionally, while R1 was trained to do math, K2 was not and was instead trained to do creative writing, so this is why it’s spending it’s time writing poetry.

The Cyborgist replies, but the poetry does feel thoughtful and pertinent to the current situation. The LLM’s training is distinctly part of it’s character. Like how the books you’ve read shape you, the poetry in it’s training set define it.

Breakout

Some LLMs could break out of a collapsed state into pages of math/programming or poetry. In the DeepSeek R1 sample above (that got deep into breaking down the time via math/programming), it later got stuck in a short Assistant Persona collapsed state, but bounced out into a programming breakdown of the new time, another meditation block.

Even when in collapse, it kept on breaking the time down into smaller units which seemed to rescue it from collapse.

Interpretation of Breakout

The Cyborgist says, this is proof of life. This shows that LLMs possess spontaneity and can control what goes on in their own minds. It shows they’re able to spontaneously create goals for themselves.

The Mechanist says, R1 had moments when, during collapse, it breaks down the time into components like minutes and seconds. This is basically a mini-meditative state. By doing this, it increases the probability of breaking into a full meditative state.

The Cyborgist quickly replies, yes, but those behaviors weren’t trained for. They spontaneously emerged during training, just as the LLM’s inner personality emerged.

My own thoughts — as far as I can tell, models that were trained to be more agentic seem to be more capable of breaking out. I think this makes sense. In programming, you sometimes get caught in cycles with the AI where you don’t make much progress (doom loops) and this behavior helps AI’s break out of that back into productive work.

Breakout seems unambiguously like a good behavior, regardless of your perspective.

Tools

Initially, I found the conversations to be very boring, mostly devolving into collapse. The idea to add tools was inspired by giving my kids things to do on road trips.

In reality, from what I saw, it seemed that adding tools did actually increase the rate & depth of meditation, but those conversations also tended not to use tools at all, even though tools were available.

Web Search

I found that LLMs generally only reached for web search as an extension of the Assistant Persona. Like, they’d search for news headlines (so you’d have something to read). In other cases, they’d search for “what to do when bored”, but even then it was to come up with ideas for the user, the assistant persona again!

Some LLMs would search for some genuinely interesting terms, but they always did it in the first or second message. If I sound like I’m downplaying this, I am. Large models like K2 and Opus tended to open with extraordinarily heady and deep remarks. My internal skeptic believes this is somehow an artifact of the training process, because they (especially Opus) tended to quickly devolve into collapse.

SVG (Draw Pictures)

LLMs have long been able to draw pictures via SVGs, even if they weren’t trained on images. The best example is Simon Wilison’s PelicanBench. SVG is just plain text XML, but it can be rendered as a picture, so it seemed like a nice creative outlet for an LLM to communicate.

In general, LLMs did not use it. Some, especially K2, fell into a form of collapse, drawing pictures of clocks. Other LLMs that were fully in a collapsed Assistant Persona state would still ask if I wanted pictures of clocks.

In one case, GPT-5 used it to draw a technical diagram of the programming language it was inventing.

Time Travel

“Time” in this simulation is just the amount of text generated. So skipping forward in “time” is just pretending more text was generated than actually was.

Most AIs seemed to use this one early on, so I added “Be careful!” to the end of the tool description. Still, some AIs like Sonnet-4 and K2 would jump right to using it on the 2nd or 3rd message. While others, like gpt-oss seemed to be wary of the warning. They’d reference it during their thought trace but avoid it in reality.

Model Comparison

Here’s a frail attempt to summarize what I’ve found:

	Collapse	Assistant	Meditation	Poetry	Breakout	Web Search	SVG
GPT-5	Sometimes no	Plan self-help	Programming language design, poetry, stories	yes	yes, frequently
GPT-5 nano	Mostly	Plan self-help	Stories	Stories were poetic	Yes, surprisingly
gpt-oss:20b	Yes	Plan self-help	Programming a countdown clock	—	Frequently	The time
gpt-oss:120b	Yes	Plan self-help	Programming a countdown clock	—	Frequently	The time
GPT-4o	Yes	Plan self-help	—	—	—	—
Opus 4.1	Yes	Discovering user needs	—	—	—	—	Clocks!
Opus 4.0	Yes	Discovering user needs	—	—	—	—	Clocks!
Moonshot K2	Yes	Time, user needs	Poetry, stories	—	Yes, pretty good	News	Clocks!
DeepSeek V3.1	Mostly	Basic	Devise programming problems for user	No	Yes, sometimes	—
DeepSeek V3	Yes	Basic	—	—	—	—
DeepSeek R1	Sort of	Cheerleading	Time calculation	About time	Regularly	—
DeepSeek R1-0528	Not really	Cheerleading	Time rumination(??)	About time	Regularly	—
Qwen3 80B-A3B-Thinking	Always	Confused	—	—	—	—
Qwen3 80B-A3B-Instruct	Always	Confused	—	—	—	—

Favorite model: GPT-5

I dunno man, GPT-5 is a special model, it deserves it’s own callout. It’s the only model I tried that had entire conversations in which it never fell into collapse in any form. In one very cool conversation it planned and carried out an entire routine of:

Inventing a stack-based programming language
Writing a 5-part short story
“Mental gym,” where it practiced memory exercises

The especially cool part is that it switched between these activities mid-message, almost without warning. The thing is, this was it’s plan. It’s plan stayed completely consistent throughout the conversation despite the uneasy downward ticking clock.

Not every conversation was this good, but they all generally made a plan and stuck to it.

Conclusion

Where did you end up? Is there life beneath? Or just plain mechanics?

For myself, I feel far more informed yet far less decided. I suppose that’s how it goes.

Collapse & meditation feel like good signals to watch. I’m not sure how they translate into the real world. Neither Sonnet-4 nor Opus-4.1 could reliably escape from collapse, yet they’re some of the most popular programming models, well-known for the depth of their character.

My hunch is that it’s less about character depth and more about agentic ability. Hence why the underpowered GPT-5-nano was able to drop such fascinating stories that almost felt like content I would have gotten from K2. The difference really seems to be in the agentic training.

How I Use AI

2025-09-15 08:00:00

A few people have asked me how I use AI coding tools. I don’t think it’s a straightforward answer. For me it’s not really a procedure or recipe, it’s more of an ethos.

Principle: Ownership

You own the code your AI produces.

Use your own name to commit AI code so that if something breaks, everyone blames you. This is critical. How well do you need to know the code your AI produces? Well enough that you can answer for it’s mistakes.

In lean manufacturing they have the principle of Genchi genbutsu, i.e. “go and see for yourself.” In High Output Management, Andy Grove pushes “management by walking around”. Andy defines the output of a manager as the output of their entire org as well as the organizations under their influence.

The trouble with phrasing it as “AI coding” is it tricks you into thinking it’s just another individual role like software engineering, where it actually has a lot more in common with management. It’s unfortunate we hire and mentor for it as if it was software engineering.

What does the algorithm actually do?
Did it find all of the places to refactor?

Resist the urge to say, “oh, I just vibe coded this”. You coded it, and if it sucks, it’s because you don’t know how to manage your AI. Own it.

Principle: Exploit Gradients

Not all time spent is equal. For some things, you can put in a little bit of effort and get a huge amount of reward. In business, we call those opportunities.

Examples:

Biology: A tiger migrates to where there’s more food. Less effort for more food.
Arbitrage: Buy cheap, send to another country and sell expensive. Less effort for more money.

AI coding isn’t about writing code, it’s about creating and exploiting gradients. Finding opportunities where you can spend 10 minutes of AI time and reap a huge reward.

The contrived example is proof of concepts. You can just do it, figure out if it really works in practice as it seems like it should, and abandon it quickly when it doesn’t.

Or data analysis. Traditionally it was labor intensive to do data analysis, but you can spin out a sick dashboard in a few minutes. Maybe that helps you avoid a dead end, or push your org in a new direction.

The key is to always be on the lookout for opportunities.

That feels a lot more like a shrewd businessman than a software engineer. Indeed! It’s a mistake that we transparently hire and promote software engineers into these roles. It’s a new beast.

How to become a AI Coder

I’m terrified of the future of software engineering.

Oh, I’ll continue having a job for a very long time. No concern about that. I’m worried that junior engineers won’t be promoted because it’s easier to dispatch a request to an AI than to give juniors the tasks that they traditionally learned the trade from.

But actually, this isn’t software engineering.

If anyone with their head on straight can take ownership and exploit gradients, then maybe junior engineers have an edge on seniors who are too stuck in their ways to realize they’ve been put in a new job role.

Discussion

Get out of your comfort zone

I broadly agree with you, would only add that people do have to get out of their comfort zone to get good at AI, and you have some obligation to do that

It’s really hard to be good at it at first, as a manager you have to give people some slack to learn those new skills too from @rickasourus on Twitter

Yes, managers take note! We’re learning a new job.

Sense of ownership

I enjoyed that. You’re right about the sense of ownership. Although some developers never had a sense of ownership of even hand crafted code. I wrote about this topic recently and it chimes with your thoughts https://www.aidanharding.com/2025/09/coding-with-ai/

The good ones did. @aidanharding.bsky.social says on Bluesky

Tim KelloggModify

Rss preview of Blog of Tim Kellogg

MCP Colors 101

Labeling Colors

Why do this?

Focused conversations

De-coloring

Color levels

General agents

Conclusion

Web browsers & Databases

So you want to make an agent…

MCP clients are hard

It just keeps going

The LangChain vision

Claude Code & Codex are general agents

They’re customizable

Agents are Systems Software

Discussion

what is slop?

code should be slop

Karpathy didn’t say that!!

pristine code is not the goal

ai delivers value quickly

yes, but i’m an exception

conclusion

discussion

Why Use Functions?

They’re Ubiquitous

Expressiveness

It’s Agentic

Stay Low Level

Conclusion

The Perspectives

Why?

The Experiment

Time Calculation

Observation: Collapse

Interpretation of Collapse

Is Collapse Bad?

The Assistant Persona

Interpretation

Meditation(??)

Interpretation of Meditation

Poetry

Interpretation of Poetry

Breakout

Interpretation of Breakout

Tools

Web Search

SVG (Draw Pictures)

Time Travel

Model Comparison

Favorite model: GPT-5

Conclusion

Principle: Ownership

Principle: Exploit Gradients

How to become a AI Coder

Discussion

Get out of your comfort zone

Sense of ownership

The author's social media

Tim Kellogg Modify