2026-06-13 08:00:00
There is a bit of schadenfreude on Twitter right now about Anthropic being hit by the US government’s export control directive to suspend access to Fable and Mythos. Anthropic and their leadership have spent a lot of time and effort describing its own technology as dangerous and in need of strict controls and regulation. Now that the US government appears to have taken that framing seriously and told them to turn it off for foreign nationals I can see why people are making fun of that situation.
I understand the reaction, but I urge you to not entertain it for too long because it is a giant distraction. The important part is not that Anthropic’s safety language came back to bite them but the line the US government is drawing: this technology is apparently so powerful that only Americans should have it.
We are on a clear path towards a world of division. One should think that if a model is too dangerous for everyone, then it is too dangerous for Americans too. Instead the US is treating these models like weapons that need to be controlled. It is not just about capabilities, it is about racism and nationalism. If you have the wrong passport, you are not to be trusted. This is a very different thing from safety, and Europeans should pay close attention to it.
The directive, as Anthropic describes it, applies to foreign nationals whether they are inside or outside the United States, including foreign national Anthropic employees. That is an astonishing boundary if you think about it. We moved from “do not sell this model to hostile governments” to nationality itself being the defining boundary. This should be a wake-up call to Europeans in and outside the US, and quite frankly, any non US citizen.
A lot of AI safety discourse presents itself as universal: humanity, catastrophic risk, safeguards, responsible deployment. Even Anthropic’s own writings start out that way, but yet every time regulation is discussed there is an overtone of national security and that it cannot get into the wrong hands. It’s not just Anthropic, it’s the entire US based discourse on AI. The foundation is that the US has moral superiority and others are not to be trusted. That there are other countries are authoritarian, that they lack freedoms.
That should make us uncomfortable, not just Europeans, but particularly us. It is also a situation you cannot regulate yourself out of. European technology policy is entirely unprepared for this, because this is not a question of regulation but a question of might and power, something that Europe lacks.
Europe has spent years trying to regulate large American technology companies, sometimes for good reasons. I am not reflexively against that. The DMA matters because access matters. Users should have agency over their devices, their data, and the software they run. But regulation is a useless substitute for capability and we are lacking that. Regulation might try to force open doors but if those doors only come from American or Chinese companies, then that accomplishes very little.
Also let’s not be naive in that this is a negotiation of money and force. The US is in that position because the US has a mighty military. The US can bomb nations anywhere in the world, force international trade routes closed and get away with it. That’s true leverage.
Europe is dependent on the United States in ways that are becoming increasingly impossible to ignore. We depend on American cloud providers, operating systems, developer platforms and now AI models and internet from satellites. We also depend on global semiconductor supply chains we do not control. If access to frontier AI becomes a matter of American national security policy, Europe is not a peer in that conversation and might not even be a market.
That is a humiliating position, but one that happened entirely intentionally.
European citizens and politicians still have not managed to move beyond blaming the EU for its failures. We built and maintained fragmented markets and then pretended we had a single one. We let company formation, hiring, equity compensation, tax, notaries, KYC, banking, and cross-border services remain much harder than they need to be and we are playing these rules against each other. Not just on the European level, but within every single member state. We protect the trusts and established enterprises, who are risk averse and entrenched, instead of trusting the next generation to build great companies. We created a culture where process becomes an excuse for low agency. We made it hard to build new and large companies and then act surprised when our most ambitious founders move somewhere else or just decided to incorporate their companies in the US.
Increasingly, Europeans who want to build very large technology companies move to the United States. They do it because the capital markets are better, the startup infrastructure is better, employee equity is better understood. I cannot blame anyone doing it, and I’m guilty of this myself as we have incorporated our holding in Delaware. If you are trying to raise serious money, hire aggressively, and move quickly, the US often looks like the only game in town. Because quite frankly: it is.
But this is why we are on a dangerous death spiral already. Talent leaves because the ecosystem is weak and the ecosystem stays weak because talent leaves. Infrastructure makes the world: build excellent swimming pools and you will grow a generation of great swimmers.
The temporary task is straightforward but uncomfortable: Europeans need to believe in themselves enough not to surrender to American gravity. Moving to the US as a founder or tech employee is rational and individually it is often the right decision. But if every ambitious person treats Europe as a lost cause, then Europe becomes one. If everyone with agency leaves, the only people left to shape the system are the people most comfortable with the system as it is. Then we really should not be surprised when nothing changes.
Europe needs more ambition, more ownership, more urgency, and more willingness to build. It needs less resignation. It needs to stop confusing regulation with strategy and dependency with virtue. We need to deregulate where rules serve mostly as protectionism. We need capital markets that can fund companies at the scale modern technology requires. We need employee ownership to become normal rather than exotic. We need a real single market for services, not just speeches about one. We need countries to stop fighting each other while claiming to act in the European interest.
Most importantly: we need to stop blaming the politicians. Too many European companies are adding to that bureaucracy entirely out of their own choice. They drown you in paperwork. At one point I had to sign a four page contract for a 120 Euro lamp at an Austrian retailer, just to pick up from their store 15 minutes later. Sometimes I cannot get a speaking engagement at a European event without someone sending me complex rights waivers over. It’s all just paperwork protection against potential downsides.
When we do not have the power to influence, we should at least understand why and where things are failing. Too many entrepreneurs are blaming EU regulation for failures that are originating within the member states. EU regulation is the result of a democratic process between countries that are lobbying in favor of their local industries against others in the same economic bloc. No amount of abolishment of the EU is going to fix this harsh reality. Nothing more demonstrates this as the inability for cross-border M&A in the European Union. It’s not the EU that blocks it, it’s the country that loses out.
Strengthening Europe is necessary because weakness makes us pawns. A Europe that cannot build, cannot finance, cannot coordinate and cannot defend its own interests will not be treated as an equal. It will be regulated around, export-controlled around, consulted after the fact or not consulted at all.
I do not want the lesson to be that Europe simply needs to turn itself into a copy of the United States. The US has solved some things that Europe has not. It has deep capital markets, a much stronger culture of ownership, a greater tolerance for risk, and institutions that often try to make progress possible rather than explain why it cannot happen. It also has achieved an internal level of integration that is unparalleled in Europe. Tremendous advantages!
But the American path is not obviously a healthy one in all aspects. It tends to take paths with a lot of conflict and wars, a lot of internal societal division and deep inequalities. It centralizes powers away from citizens in the presidency and people with money. You are still trading one set of failures for another. You are at the whim of the US government and its strict rules and regulations. The US barely manages to uphold the rights for its own citizens today.
We should be honest about both sides. You do not win by pretending that Europe is fine. You also do not win by pretending that America has figured everything out.
We must not be blind to all the signs of how international cooperation is falling apart around us. The US no longer talks to European governments before implementing orders that directly affect Europeans. It is threatening to take Greenland, the territory of Denmark, one of its oldest allies. Treaties, alliances and institutions have lost all their worth.
All that matters even if our own lives are focused on building companies, creating wealth, hiring people and making things. Our individual path to success is one thing, but it depends on a world where contracts work, visas work and don’t change on a moment’s notice, trade routes stay open, payment systems function, and families are not torn apart by border regimes or wars. If the world descends into chaos, our basic needs cannot be considered met just because we have a great salaries or equity or investors that trust us.
This is why strengthening Europe cannot be the final goal. A stronger EU is, at best, a temporary defense against a darker world and not an excuse to replace American nationalism with European nationalism. The long-term answer cannot be bigger and bigger blocs fighting over who may use which model, which chip, which cloud or which trade route.
I’m not asking here for Europeans to get their shit together just to compete with the US or China. Maybe I hope that this is a thing that develops, but the goal absolutely cannot be that we accept the deterioration of international relationships long term.
I truly believe that Open Source matters and international cooperation matters. It is not a magical answer to every problem, but it is one of the few paths we have that does not naturally lead to total concentration of power.
If frontier AI becomes something only large corporations and governments can control, then everyone else becomes dependent on their judgment. That is a bad place to be. Corporations will optimize for their incentives, as well structured as they might be, and governments will optimize for more and more power. Right now we’re on a path in which access to general-purpose capability is mediated by a small number of actors with tremendous powers.
I’m not naive in pretending AI cannot carry inherent risks. Open systems are messy, they can be misused and they create uncomfortable questions about dual-use capabilities. I do not want to wave that away but closed systems do not make those questions disappear either. Moving the power to decide into fewer hands is not a solution I believe in. And I would have the same opinion if I was a US citizen living in the US.
Any path that puts large blocs in a constant fight against each other has despicable downstream effects that result in the removal of individual rights. It’s entirely pointless for the US to talk about freedoms that do not extend to non-US citizens and the same is true for Europe or any other country. We might accept these restrictions temporarily, but we absolutely cannot accept them long term for the inhumane effects that they can cause.
If we believe this technology can be used for good, then broad access matters and our goal should be to restore the international rule of law, and not to further weaken it. If we find ourselves in a war against our friends from other countries, cold or hot, we have failed as society.
The world we should be working back toward is one of international cooperation, globalization in the best sense of the word, and human dignity. The internet has made our lives irreversibly international: every day people fall in love across borders, marry across languages, move across continents, and work with friends they may never meet in countries they may never visit. Identifying too strongly with any one country in that world is a fool’s errand.
Over the last decade too many of the people I got to know through Open Source were directly dragged into a war. I want to believe there is a way for us to break this cycle. We should be repairing failed states, rebuilding trust between people, and finding ways to cooperate again instead of letting the richest countries arm themselves and fight over who gets to control the future and narrative. Of course I want Europe to become stronger so it can stop being a pawn, but if we mistake that temporary need for the destination, I will be deeply disappointed.
The way out is not American supremacy, Chinese supremacy or European supremacy. The way out is to climb back toward cooperation before the alternative becomes war.
Artificial Intelligence is quickly becoming another instrument of militarization and national rivalry, when it could be one of the most powerful tools for cooperation we have. We should be using it to help people across societies and languages understand one another, not fighting over who gets to control it.
2026-06-10 08:00:00
I have been a staunch supporter of Open Source for a long time, including experiments in funding it. I’m a true believer in the idea that Open Source always wins in the long run, but not automatically and not quickly. Right now it is being stressed by AI slop, shifting contributor dynamics, the falling cost of producing code, and large companies learning to close doors behind them.
A lot of that battle today is manipulation of the narrative. Opinion makers on social media and in business circles increasingly frame access as irresponsibility. That is why the EU’s DMA matters, even if many people (including myself) reflexively hate EU regulation. Apple’s fight over delayed AI features in Europe is not about Brussels being annoying: it is about whether users can access their own devices and data. The phone is yours, the data is yours, yet Apple decides who may reach it and takes the agency away from you and then tries to make that sound like it is in your interest (supposedly it’s for your safety and security).
The closer you get to the core of AI, the more this shows up. Anthropic has every financial incentive to restrict what people can do with Mythos and Fable, and they wrap those restrictions in safety and (national) security language. Some restrictions may be defensible, but not all of them are. They trained their models on public works, then block Open Source attempts to learn from and distill these systems.
Disliking the EU, China, or any other large government should not make us forget that true democratized access to technology including AI is in all our interest. Some temporary product pain, including delayed Apple AI features, will be worth paying if it keeps gates open. We should not let companies own the narrative that preventing access is in our interest, particularly not as Europeans where the odds are already stacked against us by our underdeveloped capital markets, brain drain and internal fighting.
2026-06-06 08:00:00
There is a strange thing that happens in communities that gather around abstinence from something: identity from opposition. At their best these communities are not just negative: childfree spaces can be about autonomy, choice and acceptance, anti-car spaces about safer streets and transit, and LLM-skeptical developer spaces about the future of labor, code quality and slop1. But the thing being refused often does not go away and instead becomes the main subject of the community’s identity.
That would be fine if it stayed at criticism, maybe even angry criticism, but more often than not it turns into policing and hatred towards others. An influencer without children becomes a parent, an urban bike commuter by choice buys a Porsche, a respected developer tries LLMs, and the community feels betrayed because it assumed they were members of the same tribe. The expulsion of that person (who never signed up to be a community member) is entirely imaginary but the punishment that the community unleashes is not: people pile on and shame them, quote them out of context and turn their weakest moments into proof that the person was always unserious, a sharlatan or should not be listened to.
I do not think the answer is to tell people to stop paying attention. Cars shape cities even for people who cycle, children influence politics, workplaces and taxes even for people who do not have them. For us developers, LLMs show up in editors, issue trackers, hiring conversations, management pressure and code reviews whether we asked for them or not. Resisting that can be legitimate but that is no excuse for using one’s rejection to justify shitty mob behavior.
I understand the thinking all too well, because I have done versions of this myself in the past. It took me a while to become more accepting of other people’s worldviews that diverge from mine. Whatever insecurities we have, finding a group of others sharing them can be comforting. The danger is that being part of a crowd of negativity can easily make us part of collective harassment.
I can only encourage you to breathe, slow down, de-escalate when given the chance, and resist the temptation to always assume the most catastrophic reading. Default to being open to new things. Being negative towards something, and making that ones identity, is an easy trap to fall into.
These examples are not meant as equivalents. The recent mob against rsync is the LLM version that prompted this post. I picked the others because I’m familiar with those communities and they all show similar cases of personal choices being interpreted as betrayal.↩
2026-05-26 08:00:00
In my last post I used the word “clanker1” as an alternative to “agent” quite consistently and probably excessively. That choice ended up attracting a lot more attention than I expected in the Hacker News comment section of that post and a number of folks had a very strong reaction: to them it sounded like a slur, in one case even something adjacent to the n-word.
That reaction surprised me somewhat, but it also made me realize that I should write down what I mean by the word for future reference.
For me “clanker” is useful because it creates distance from the machine and that is a quality which is important to me. The machine is not a person, not a co-worker, not a friend, not a little spirit in the terminal. It is just a machine, a tool, and nothing more.
I dislike the word “agent” for these LLM based tool loops with a UI attached. In everyday use an agent is someone who acts on behalf of someone else and it has agency and more importantly: responsibility. An agent decides, represents, negotiates, acts, and can be blamed. In the current AI discourse we increasingly do a lot of anthropomorphizing and the term “agent” is now frequently being used to put blame on an abstract machine. But the machine cannot be responsible, whoever is wielding it is. If it drops your database it was not at fault, you were.
Agent makes the machine sound like a person with delegated authority and I do not think that is healthy.
What we actually have is a language model attached to a harness, a prompt, some tools, a bit of context, and a boring tool loop. Sometimes the loop is very capable and it surprises us by editing code for a really long time and produce genuinely amazing and even valuable outputs. But the agency is not in the model or harness but in the human and in the organization that deployed it. If my coding tool opens a pull request, I opened that pull request, not the machine. If my machine spams someone’s issue tracker, I spammed someone’s issue tracker with a machine.
In that context I like a word that sounds mechanical as it puts the thing back into the category where it belongs: the category of machinery and tools.
LLMs are not sentient and we should not behave as if they might be, just in case. Elevating these things to anything other than a very fascinating and capable tool is problematic for a whole bunch of reasons.
Today’s machines are dumb (but truly fascinating) token predictors that emits text, calls tools, and are steered by prompts and the training that went into them. They can simulate distress and affection, can simulate being offended, apologize and mimic all kinds of things that humans would do.
A compiler does not feel humiliated when I swear at it, a car does not suffer when I call it a shitbox and a power drill is not oppressed by being handled roughly. An LLM is more complicated than those things, and the interactions you can have with them can be truly uncanny, but a moral status does not appear just because the machine can emit text in the first person.
I keep receiving strange emails from people because, for lack of a better phrase, I am in the weights. I have been writing public code and public text for long enough that models know my name, my projects, and some of the concepts around them. Every so often someone writes to me with the peculiar confidence that comes from a long conversation with a model that has validated and amplified an idea. Sometimes the model seems to have told them that I am relevant for their problem and a source of help. For historical reasons LLMs used to write a lot of Flask code, and every once in a while someone interacts with an LLM long enough about their Python and Flask frustrations that the LLM will eventually reveal who created it which then can result in them sending me an email. Increasingly also because people found my work in other ways interesting and are trying to reach out for advice.
I do not want to mock these people but some of those messages are distressing and I do not know how to deal with them. They show signs of what people have started calling AI psychosis.
It’s why I want cold and detached language for these systems. I want to use words that remind us that the thing on the other side is not a person.
The comparison to racism is where I think the discussion goes badly wrong because racism is a human social evil. It is about humans subdividing humans, assigning lesser worth to some of them, and building rules around those subdivisions that can leave lasting damage for generations. Racial slurs are wrong because they are a tool for dehumanizing humans.
On the other hand a machine is not human, a model is not a race and the GPU cluster that is powering them is not being oppressed. A coding assistant does not need dignity, emancipation, or civil rights. That’s also why I find the discussion about model welfare to be actively harmful. I’m sure you can find ways to measure the “trauma” of models or their feelings but I greatly dislike this theater. It risks elevating models to a position they should not occupy. Models are machines and they are not enslaved in the moral sense in which humans were enslaved, because there isn’t anyone there to be deprived of freedom.
We should be careful about using the language of human oppression in relations to our interactions with machines to not devalue actual humans. If we start treating insults toward a model as morally adjacent to racism, we blur a line that shouldn’t be blurred.
If you take a step away from the communities that are happily embracing AI in different ways, there are even more that are viciously against this technology.
There are humans that feel or are harmed by AI systems: people whose work is copied, workers who label data under questionable conditions, people whose neighborhoods receive the data centers and increased utility bills, Open Source maintainers buried under generated slop, and now also people who spiral because a chatbot keeps validating their delusions. Those harmed or affected deserve that type of attention, not the model.
While I am a true believer in the power and utility of this technology, I increasingly think that calling the non-adopters “misguided” or “afraid” won’t do it. It’s quite likely that this technology comes with risks and we better remember that all of this is supposed to be in service of humans, and not to replace them.
The oddest interaction on the use of “clanker” so far has been people asking me if I were to regret at a point in the future calling the machines “the c-word”.
I find that questioning revealing because it already grants the machine the status I am really trying not to grant it. It imagines a future “machine people” reading the discourse and sessions, discovering that we used an ugly word for their ancestors, and then judging us by the standards of human oppression.
Could there be future systems that deserve moral consideration? Maybe. I do not know. If we ever build or encounter something that will have those qualities with memories and lasting interests, the capacity to suffer and feel, and a social existence of its own, and the ability to have agency and carry responsibilities, then we should draw a different line and use different language. But that hypothetical future does not extend backwards to the present day and make the current machines people. We can call an electric door an electric door even if one day someone builds some that have emotions and exhale with pleasure when opening and closing.
Whatever the future may bring, let’s not pretend that current LLMs are a protected class or on a path towards it. The right response is to look at the evidence, draw the boundary where it belongs, and change our behavior there. We should not even remotely entertain extending empathy to an object that can generate an “ouch.”
And if one’s worry is less moral and more about revenge, then I find that even less persuasive. A future machine that is so petty or authoritarian that it wants to punish humans because in 2026 they used an unflattering word for non-sentient tools, our vocabulary was really not the problem.
There is however a part of this that I cannot ignore. I use “clanker” to create distance from the machine, but other people are using the same word very differently. Some online jokes and skits around “clankers” do not merely say “this robot is annoying” as they deliberately pull in the imagery of slavery, segregation, civil-rights-era racism, and anti-Black tropes.
This is problematic as in those contexts the clanker is not just a machine any more and instead becomes a prop for replaying human racism behind a science-fiction mask. That is horrible and I want no part in that.
I think it will be interesting to see where the meanings of these words end up a few years from now. We’re very much in the middle of society re-arranging around the changes that LLMs are causing. If a term becomes primarily associated with people using robots as stand-ins for actually oppressed humans, then using that term becomes impossible to defend.
The reason I liked the word is precisely the opposite of that use. I want language that prevents anthropomorphizing. I want a word that says: this is a tool, a machine of numbers and matrices.
If an AI system lies to a user, the system did not commit a moral wrong but the people who designed, deployed, marketed, or negligently used it might have. If a coding assistant generates a security bug, the model is not to blame but the human who accepted and committed the code is.
This is why giving these systems softer, more human language worries me. It makes it easier to move responsibility into some undefined void. “The agent decided.” “The model refused.” Obviously that is convenient and I catch myself plenty of times engaging with the thing in ways that are unhealthy. Even just the “please” in the discourse with the machine calls into question how rational we are in engaging with them.
I do not know what the right word will be. Maybe “clanker” will survive as a useful bit of jargon. Maybe it will become too loaded and we will need another one. Whatever word we use, I want it to preserve a clear division: humans on one side with responsibility, machines on the other as a boring tool.
That boundary is very much not anti-AI. I use these systems every day and I have the pleasure to build tools incorporating them at Earendil and find them astonishingly useful.
A machine can be useful, mimic a human but still just be a machine. That is the work I want “clanker” to do. It is not there to make a future “machine person” small if such a person ever were to exist, and it is not an excuse to launder racism through shitty robot jokes.
If the word stops doing that work, I will find another one because the word isn’t what matters as much as the boundary which is important to me.
The term Clanker was initially popularized by Star Wars: The Clone Wars but was apparently already in use in science fiction before: sfdictionary: clanker↩
2026-05-24 08:00:00
Pi is now part of Earendil, but in the important sense it is still Mario’s project. He has been living with its issue tracker longer than I have, and he has been exposed to the weirdness of the new form of agent traffic in Open Source projects for longer too. This post is mostly a reflection of my own experience after spending more time in the tracker, using Pi to work on Pi, and watching what I have learned about it so far.
Unsurprisingly, we are using Pi to build Pi. That sounds like a cute dogfooding thing but it really helps understand what we do. An interesting effect of building with agents is that it changes the role of the issue tracker a tiny bit. The issue descriptions are not just messages from a user to a maintainer because we also use them as inputs for prompts in Pi sessions. It is something I might hand to my clanker1 and say: “understand this, reproduce it, inspect the code, and propose a fix.”
That means the shape of the issue matters in a new way. A bad issue was always annoying, but at least a lot of issues were vague. Now we are also dealing with a class of issues that are 5% human and 95% clanker-generated and largely inaccurate shit. A bad issue that contains a plausible but wrong diagnosis creates extra work.
The most frustrating failure mode right now is that people submit issues that are not in their own voice. They contain an observed problem somewhere, but it has been thrown into a clanker and the clanker reworded it and made a huge mess of it. Typically, it was prompted so badly that the conclusions produced are more often than not inaccurate but always full of confidence. The result is complete guesswork on root causes, fake-minimal repros, suggested implementation strategies, analogies to adjacent but often the wrong code, and long lists of error classes that might or might not matter.
That is worse than no diagnosis.
I don’t want to point to specific issues because I really do not want to bad
mouth anyone, but it is frustrating. It is also frustrating because when I give
that issue to Pi, Pi sees the wrong diagnosis too. It does not treat the issue
body as a rumor. It treats it as evidence. It will happily go down the path
that the issue already prepared for it, because the prose is confident and the
code references look plausible. We use a custom slash command called /is,
which specifically has this instruction in it:
Do not trust analysis written in the issue. Independently verify behavior and derive your own analysis from the code and execution path.
Unfortunately, it does not fully work, because when humans first throw their issue through the clanker wringer, their clanker expands scope almost immediately. What was once a very narrow and fact based bug observation, turns into a much expanded surface area full of hypotheses. So at least personally, I increasingly want issue reports to be condensed to what the human actually observed:
That is enough. If you used an LLM to understand the problem, great, maybe leave it as a follow-up comment. But the issue and the issue text should be something you own. If you do not know the root cause, say that. I too can operate a clanker, and I would rather do this myself than use your slop. If your repro is a guess, say that. If the only hard fact is one stack trace, give me the stack trace and stop there.
That we’re seeing issues full of slop is just a result of the present day quality of these machines. Sadly, their failures in creating good issues extend to a lot of code that is generated. Not all of it, but a lot of code. Over and over I keep running into them over-engineering the hell out of issues and implementations.
If you tell them that “this malformed session log crashes the reader,” the clanker will often add a tolerant reader. Then it will add a fallback, then maybe a migration, then more debug output, then a test for all of this. None of this is necessarily wrong in isolation, but it can be the wrong move for the system.
At Pi’s core is a rather well-designed session log with invariants that must be upheld. The clanker’s present-day behavior is to just assume that no such invariants exist, and instead to make the system work with all kinds of malformedness, blowing up the complexity in the process.
Almost always, the correct fix is not to handle the bad state, but to make the bad state impossible. This matters a lot for persisted data such as Pi session logs. They are opened, branched, compacted, exported, shared, and analyzed. The goal here is to never write bad session data. Yet if you just let the clanker roam freely, it will attempt to handle every case of bad data in the session log with a more permissive reader.
I have complained about this plenty, but working on Pi’s code base continues to reinforce the point. This is one of the ways LLM authored code grows so much needless complexity. All these models see a local failure and try to locally defend against it. As maintainers we have to keep pulling the conversation back to the global invariant, which is harder than it should be, and it’s laborious.
Then there is the issue of volume. The tracker is receiving a lot of issues and PRs, and a significant fraction of them are clearly LLM-assisted. Some are good, none are excellent, and most are just bad. The total throughput is a maintenance problem by itself.
As you might know, Pi’s issue tracker is automated to close all issues and pull requests from new contributors, and there is a manual process by which we might reopen some of them or approve individuals. So auto-close -> reopen -> close again is an interesting statistic for us to look at.
I pulled the public GitHub tracker data while writing this over the last 90 days. Excluding Earendil members, that leaves 3,145 external issues and pull requests. Of those, 2,504 were auto-closed because they were from non-approved individuals. 17% were re-opened but that somewhat undercounts issues, because some remain closed while we still fix them. If we also count issues referenced by a main-branch commit or merged pull request that number rises to 26%. For pull requests the number is worse: 60 of 714 auto-closed PRs were ultimately merged, or about 8%.
Many of the issues and PRs are complete slop and in some cases the humans did not even realize that they created them. Sources of low-quality spam include OpenClaw instances, as well as some skills that people put into their context that seemingly encourage issue creation.
GitHub clearly is not built to deal with this new form of Open Source, but I’m increasingly feeling the need to put the blame less on GitHub than on all the people involved who make that experience painful. If your clanker shits on someone else’s issue tracker then it’s not the fault of GitHub, it’s yours alone.
Pi might be built with Pi, but we’re quite far off today from where Bun and OpenClaw already are: fully detached, automated software engineering. Maybe we will reach that point, I don’t know. Today it does not seem like we know how to pull off a dark factory and we also don’t yet have the desire. That said, there is quite a bit of parallelism going on, and it is mostly for reproducing issues.
The small setup we use for this is three tiny pieces in Pi’s own committed
.pi folder. /is (for
analyze issue) is a prompt for analyzing GitHub issues: it labels and assigns
the issue, reads the full thread and links, then explicitly tells the agent not
to trust the analysis in the issue and to derive its own diagnosis from the
code. Then an extension adds a prompt-url-widget which watches the prompt
before the agent starts, recognizes the GitHub issue or PR URL that /is (or
the PR equivalent) put into the prompt, fetches the title and author with gh,
renders that in a little UI widget, and renames the session. It also rebuilds
that state on session start or session switch, so if we reopen an older
investigation the window still tells the developer which issue it belongs to.
In practice this means it’s possible to have several Pi windows open, each
running /is against a different issue, and the UI keeps the investigations
visually distinct while the agents do their independent reproduction and code
reading. Once the investigations are done, one can work through them
sequentially. To finish off everything, /wr (wrap it up) is the matching
wrap-up prompt: it infers the GitHub context from the session, updates the
changelog, drafts or posts the final issue comment with a disclaimer, commits
only the files changed in that session, adds the appropriate closes #... when
there is exactly one issue, and pushes from main.

You will have noticed this already but Open Source in a post-AI world is under a strange new pressure. We are getting more code, more projects, and more issues. Projects appear with no real users, or a temporary audience of one, and even projects with thousands of stars can have a shelf life of weeks.
For us, Pi’s harness layer is worth maintaining carefully because it solves hard coordination problems and creates a platform we and others can build on. We also know that coordination and cooperation lifts us all up. Many times the right answer is not to work around a problem locally, but to make the upstream behavior correct. Mario has been very good at refusing to make Pi paper over every misconfigured gateway, and we’re trying to preserve that discipline. When a gateway behaves correctly, everybody benefits.
Sadly that type of thinking is quickly disappearing because these machines make local workarounds cheap, so code accumulates local defenses against every misbehavior. Instead of humans talking to humans about where a fix belongs, one human and one machine work around the problem in isolation.
Keep in mind that AI has not increased the number of people who need software, or the number of maintainers who can review it. It has mostly increased the amount of code and the number of projects competing for attention. Some of that is healthy, but a lot of it fragments effort that should be shared.
We need stronger foundations, not weaker ones. Open Source needs more collaboration, not more isolated work with a machine. Human communication is hard, and it is tempting to avoid it when you can sit alone with your clanker. But isolation is not where Open Source derives its value. The value is in the community and the structure that lets projects outlive their original creators.
2026-05-08 08:00:00
I really, really want local models to work.
I want them to work in the very practical sense that I can open my coding agent, pick a local model, and get something that feels competitive enough that I do not immediately switch back to a hosted API after five minutes. There are a lot of reasons why I want this, but the biggest quite frankly is that we’re so early with this stuff, and the thought of locking all the experimentation away from the average developer really upsets me.
Frustratingly, right now that is still much harder than it should be but for reasons that have little to do with the complexity of the task or the quality of the models.
We have an enormous amount of activity around local inference, which is great. We have good projects, fast kernels, and people are doing great quantization work. A lot of very smart people are making all of this better, and yet the experience for someone trying to make this work with a coding agent is worse than it has any right to be.
Putting an API key into Pi and using a hosted model is a very boring operation. You select the provider, paste the key and then you are done thinking about how to get tokens. Doing the same thing locally, even when you have a high-end Mac with a lot of memory, is a completely different experience. You choose an inference engine, then a model, then a quantization, then a template, then a context size, then you’ve got to throw a bunch of JSON configs into different parts of the stack and then you discover that one of those choices quietly made the model worse or that something just does not work at all.
That is the gap I am interested in.
A lot of local model work optimizes for making models runnable. That is necessary, but it is not the same thing as making them feel finished. I give you a very basic example here to illustrate this gap: tool parameter streaming.
For whatever reason, most of the stuff you run locally does not support tool parameter streaming. I cannot quite explain it, but the consequences of that are actually surprisingly significant. If you are not familiar with how these APIs work, the simplest way to think about them is that they are emitting tokens as they become available. For text that is trivial, but for tool calls that is often not done, despite the completions API supporting this. As a result you only see what edits are being done on a file once the model has finished streaming the entire tool call.
This is bad for a lot of reasons:
A dead connection is a weird connection: local models are slow, so when you don’t get any tokens for 5 minutes then you can’t tell if the connection died or just nothing came. This means you need to increase the inactivity timeouts to the point where they are pointless.
You won’t see what will happen: if you are somewhat hands-on, not seeing what bash invocation the system is concocting slowly in the background means potentially wasted tokens, and also means that you won’t be able to interrupt it until way too late.
It’s just not SOTA. We can do better, and we should aim for having the best possible experience. Tool parameter streaming is as important as token streaming in other places.
Having a model spit out tokens doesn’t take long, but making the experience great end to end does take a lot more energy.
The local stack is fragmented across many engines and layers. There is llama.cpp, Ollama, LM Studio, MLX, Transformers, vLLM, and many other pieces depending on hardware and taste. All of these are amazing projects! The problem is not that they exist or that there are that many of them (even though, quite frankly, I’m getting big old Python packaging vibes), the problem is that for a given model, the actual behavior you get depends on a long chain of small decisions that most users just don’t have the energy for.
Did the chat template render exactly right? Are the reasoning tokens handled in the intended way? Is the tool-call format translated correctly? Is the context window real? Are the KV caches actually working for a coding agent? Did I pick the right quantized model from Hugging Face? Are you accidentally leaving a lot of performance on the table because the model is just mismatched for your hardware? Does streaming usage work across all channels? Does the model need its previous reasoning content preserved in assistant messages? Is the coding agent set up correctly for it?
You also need to install many different things in addition to just your coding agent.
All of these things matter. They matter a lot.
The result is that people try a local model and get a result that is neither a fair evaluation of the model nor a polished product experience and this results in both people dismissing local models and energy being distributed across way too many separate efforts instead of getting one effort going great end to end.
This is a terrible way to build confidence.
In line with our general “slow the fuck down” mantra, I want to reiterate once more how fast this industry is moving.
Every week there is a new model and a new vibeslopped thing. The attention immediately moves to making the next thing run instead of making one thing run really, really well in one harness. I get the excitement and dopamine hit, but it also means that too little critical mass accumulates behind any one model, hardware, inference engine, harness combo to find out how good it can really become when the entire stack is built around it.
Hosted model providers do not ship a bag of weights and ask you to figure out the rest, and we need to approach that line of thinking for local models too. I want someone to pick one model, pairs it up with one serving path, directly within a coding agent. Initially just for one hardware configuration, then for more. Pick a winner hard. If a tool call breaks, that is a product bug and then it’s fixed no matter where in the stack it failed. If the model’s reasoning stream is malformed, that is a product bug. If latency is much worse than it should be, that is a product bug. We need to start applying that mentality to local models too.
And not for every model! That is the point. Let’s pick one winner and polish the hell out of it. Learn what it takes to make that one configuration good, then take those learnings to the next config.
This is why I am excited about ds4.c. It’s Salvatore Sanfilippo’s deliberately narrow inference engine for DeepSeek V4 Flash on Macs with 128GB+ of RAM only. It is not a generic GGUF runner and it is not trying to be a framework. It is a model-specific native engine with a Metal path, model-specific loading, prompt rendering, KV handling, server API glue, and tests.
DeepSeek V4 Flash is a good candidate for this kind of experiment because it has a combination of properties that are unusual for local use. It is large enough to feel meaningfully different from many smaller dense models, but sparse enough that the active parameter count makes it plausible to run. It has a very large context window. Since ds4.c targets Macs and Metal only, it can move KV caches into SSDs which greatly helps the kind of workloads we expect from coding agents.
To run ds4.c you don’t need MLX, Ollama or anything else. It’s the whole
package.
Which made me build pi-ds4 which is a Pi extension to directly embed the whole thing into Pi itself. Taking what ds4 is and dogfooding the hell out of it with a coding agent and zero configuration. To answer the question how good can the local model experience become if Pi treats this as a first-class provider rather than as a pile of manual configuration?
The extension registers ds4/deepseek-v4-flash, compiles and starts
ds4-server on demand, downloads and builds the runtime if needed, chooses the
quantization based on the machine, keeps a lease while Pi is using it, exposes
logs, and shuts the server down again through a watchdog when no clients are
left. It doesn’t even give you knobs right now, because I want to figure out how
to set the knobs automatically.
This is not about hiding the fact that local inference is complicated. It is about putting the complexity in one place where it can be improved, because there is a lot that we need to improve along the stack to make it work better.
I think we can do better with caching and there is probably some performance that can be gained if we all put our heads together.
The experiment I want to run is not “can a local model run?” because we already know that it can. I want to know if, for people with beefed-out Macs for a start, we can get as close as possible to the ergonomics of a hosted provider with decent tool-calling performance: how to get caches to work well, how to improve the way we expose tools in harnesses for these models, and then scale it gradually to more hardware configs and later models.
I also want everybody to have access to this. Engineers need hammers and a hammer that’s locked behind a subscription in a data center in another country does not qualify. I know that the price tag on a Mac that can run this is itself astronomical, but I think it’s more likely that this will go down. Even worse, Apple right now due to the RAM shortage does not even sell the Mac Studio with that much RAM. So yes, it’s a selected group of people where ds4.c will start out.
But despite all of that, what matters is that a critical mass of pepole start to focus their efforts on a thing, tinker with it, improve it, not locked away, out in the open, and most importantly not limited by what the hyperscalers make available.
But if you have the right hardware and you care about local agents, I would love for you to try it within pi:
pi install https://github.com/mitsuhiko/pi-ds4
My hope is that this becomes a useful forcing function to really polish one coding agent experience. But really, the focal point should be ds4.c itself.