MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

AI should help us produce better code

2026-03-11 06:25:09

Agentic Engineering Patterns >

Many developers worry that outsourcing their code to AI tools will result in a drop in quality, producing bad code that's churned out fast enough that decision makers are willing to overlook its flaws.

If adopting coding agents demonstrably reduces the quality of the code and features you are producing, you should address that problem directly: figure out which aspects of your process are hurting the quality of your output and fix them.

Shipping worse code with agents is a choice. We can choose to ship code that is better instead.

Avoiding taking on technical debt

I like to think about shipping better code in terms of technical debt. We take on technical debt as the result of trade-offs: doing things "the right way" would take too long, so we work within the time constraints we are under and cross our fingers that our project will survive long enough to pay down the debt later on.

The best mitigation for technical debt is to avoid taking it on in the first place.

In my experience, a common category of technical debt fixes is changes that are simple but time-consuming.

  • Our original API design doesn't cover an important case that emerged later on. Fixing that API would require changing code in dozens of different places, making it quicker to add a very slightly different new API and live with the duplication.
  • We made a poor choice naming a concept early on - teams rather than groups for example - but cleaning up that nomenclature everywhere in the code is too much work so we only fix it in the UI.
  • Our system has grown duplicate but slightly different functionality over time which needs combining and refactoring.
  • One of our files has grown to several thousand lines of code which we would ideally split into separate modules.

All of these changes are conceptually simple but still need time dedicated to them, which can be hard to justify given more pressing issues.

Coding agents can handle these for us

Refactoring tasks like this are an ideal application of coding agents.

Fire up an agent, tell it what to change and leave it to churn away in a branch or worktree somewhere in the background.

I usually use asynchronous coding agents for this such as Gemini Jules, OpenAI Codex web, or Claude Code on the web. That way I can run those refactoring jobs without interrupting my flow on my laptop.

Evaluate the result in a Pull Request. If it's good, land it. If it's almost there, prompt it and tell it what to do differently. If it's bad, throw it away.

The cost of these code improvements has dropped so low that we can afford a zero tolerance attitude to minor code smells and inconveniences.

AI tools let us consider more options

Any software development task comes with a wealth of options for approaching the problem. Some of the most significant technical debt comes from making poor choices at the planning step - missing out on an obvious simple solution, or picking a technology that later turns out not to be exactly the right fit.

LLMs can help ensure we don't miss any obvious solutions that may not have crossed our radar before. They'll only suggest solutions that are common in their training data but those tend to be the Boring Technology that's most likely to work.

More importantly, coding agents can help with exploratory prototyping.

The best way to make confident technology choices is to prove that they are fit for purpose with a prototype.

Is Redis a good choice for the activity feed on a site which expects thousands of concurrent users?

The best way to know for sure is to wire up a simulation of that system and run a load test against it to see what breaks.

Coding agents can build this kind of simulation from a single well crafted prompt, which drops the cost of this kind of experiment to almost nothing. And since they're so cheap we can run multiple experiments at once, testing several solutions to pick the one that is the best fit for our problem.

Embrace the compound engineering loop

Agents follow instructions. We can evolve these instructions over time to get better results from future runs, based on what we've learned previously.

Dan Shipper and Kieran Klaassen at Every describe their company's approach to working with coding agents as Compound Engineering. Every coding project they complete ends with a retrospective, which they call the compound step where they take what worked and document that for future agent runs.

If we want the best results from our agents, we should aim to continually increase the quality of our codebase over time. Small improvements compound. Quality enhancements that used to be time-consuming have now dropped in cost to the point that there's no excuse not to invest in quality at the same time as shipping new features. Coding agents mean we can finally have both.

Tags: coding-agents, ai-assisted-programming, generative-ai, agentic-engineering, ai, llms

Production query plans without production data

2026-03-09 23:05:15

Production query plans without production data

Radim Marek describes the new pg_restore_relation_stats() and pg_restore_attribute_stats() functions that were introduced in PostgreSQL 18 in September 2025.

The PostgreSQL query planner makes use of internal statistics to help it decide how to best execute a query. These statistics often differ between production data and development environments, which means the query plans used in production may not be replicable in development.

PostgreSQL's new features now let you copy those statistics down to your development environment, allowing you to simulate the plans for production workloads without needing to copy in all of that data first.

I found this illustrative example useful:

SELECT pg_restore_attribute_stats(
    'schemaname', 'public',
    'relname', 'test_orders',
    'attname', 'status',
    'inherited', false::boolean,
    'null_frac', 0.0::real,
    'avg_width', 9::integer,
    'n_distinct', 5::real,
    'most_common_vals', '{delivered,shipped,cancelled,pending,returned}'::text,
    'most_common_freqs', '{0.95,0.015,0.015,0.015,0.005}'::real[]
);

This simulates statistics for a status column that is 95% delivered. Based on these statistics PostgreSQL can decide to use an index for status = 'shipped' but to instead perform a full table scan for status = 'delivered'.

These statistics are pretty small. Radim says:

Statistics dumps are tiny. A database with hundreds of tables and thousands of columns produces a statistics dump under 1MB. The production data might be hundreds of GB. The statistics that describe it fit in a text file.

I posted on the SQLite user forum asking if SQLite could offer a similar feature and D. Richard Hipp promptly replied that it has one already:

All of the data statistics used by the query planner in SQLite are available in the sqlite_stat1 table (or also in the sqlite_stat4 table if you happen to have compiled with SQLITE_ENABLE_STAT4). That table is writable. You can inject whatever alternative statistics you like.

This approach to controlling the query planner is mentioned in the documentation: https://sqlite.org/optoverview.html#manual_control_of_query_plans_using_sqlite_stat_tables.

See also https://sqlite.org/lang_analyze.html#fixed_results_of_analyze.

The ".fullschema" command in the CLI outputs both the schema and the content of the sqlite_statN tables, exactly for the reasons outlined above - so that we can reproduce query problems for testing without have to load multi-terabyte database files.

Via Lobste.rs

Tags: databases, postgresql, sql, sqlite, d-richard-hipp

Perhaps not Boring Technology after all

2026-03-09 21:37:45

A recurring concern I've seen regarding LLMs for programming is that they will push our technology choices towards the tools that are best represented in their training data, making it harder for new, better tools to break through the noise.

This was certainly the case a couple of years ago, when asking models for help with Python or JavaScript appeared to give much better results than questions about less widely used languages.

With the latest models running in good coding agent harnesses I'm not sure this continues to hold up.

I'm seeing excellent results with my brand new tools where I start by prompting "use uvx showboat --help / rodney --help / chartroom --help to learn about these tools" - the context length of these new models is long enough that they can consume quite a lot of documentation before they start working on a problem.

Drop a coding agent into any existing codebase that uses libraries and tools that are too private or too new to feature in the training data and my experience is that it works just fine - the agent will consult enough of the existing examples to understand patterns, then iterate and test its own output to fill in the gaps.

This is a surprising result. I thought coding agents would prove to be the ultimate embodiment of the Choose Boring Technology approach, but in practice they don't seem to be affecting my technology choices in that way at all.

Update: A few follow-on thoughts:

  1. The issue of what technology LLMs recommend is a separate one. What Claude Code Actually Chooses is an interesting recent study where Edwin Ong and Alex Vikati where they proved Claude Code over 2,000 times and found a strong bias towards build-over-buy but also identified a preferred technical stack, with GitHub Actions, Stripe, and shadcn/ui seeing a "near monopoly" in their respective categories. For the sake of this post my interest is in what happens when the human makes a technology choice that differs from those preferred by the model harness.
  2. The Skills mechanism that is being rapidly embraced by most coding agent tools is super-relevant here. We are already seeing projects release official skills to help agents use them - here are examples from Remotion, Supabase, Vercel, and Prisma.

Tags: ai, generative-ai, llms, ai-assisted-programming, boring-technology, coding-agents, agentic-engineering, november-2025-inflection

Quoting Joseph Weizenbaum

2026-03-08 22:59:48

What I had not realized is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.

Joseph Weizenbaum, creator of ELIZA, in 1976 (via)

Tags: ai-ethics, ai, computer-history, internet-archive

Codex for Open Source

2026-03-08 02:13:39

Codex for Open Source

Anthropic announced six months of free Claude Max for maintainers of popular open source projects (5,000+ stars or 1M+ NPM downloads) on 27th February.

Now OpenAI have launched their comparable offer: six months of ChatGPT Pro (same $200/month price as Claude Max) with Codex and "conditional access to Codex Security" for core maintainers.

Unlike Anthropic they don't hint at the exact metrics they care about, but the application form does ask for "information such as GitHub stars, monthly downloads, or why the project is important to the ecosystem."

Via @openaidevs

Tags: open-source, ai, openai, generative-ai, llms, codex-cli

Quoting Ally Piechowski

2026-03-07 05:58:33

Questions for developers:

  • “What’s the one area you’re afraid to touch?”
  • “When’s the last time you deployed on a Friday?”
  • “What broke in production in the last 90 days that wasn’t caught by tests?”

Questions for the CTO/EM:

  • “What feature has been blocked for over a year?”
  • “Do you have real-time error visibility right now?”
  • “What was the last feature that took significantly longer than estimated?”

Questions for business stakeholders:

  • “Are there features that got quietly turned off and never came back?”
  • “Are there things you’ve stopped promising customers?”

Ally Piechowski, How to Audit a Rails Codebase

Tags: technical-debt, software-engineering, rails