RSS preview of Armin Ronacher

Rss preview of Blog of Armin Ronacher

Absurd Workflows: Durable Execution With Just Postgres

2025-11-03 08:00:00

It’s probably no surprise to you that we’re building agents somewhere. Everybody does it. Building a good agent, however, brings back some of the historic challenges involving durable execution.

Entirely unsurprisingly, a lot of people are now building durable execution systems. Many of these, however, are incredibly complex and require you to sign up for another third-party service. I generally try to avoid bringing in extra complexity if I can avoid it, so I wanted to see how far I can go with just Postgres. To this end, I wrote Absurd ¹, a tiny SQL-only library with a very thin SDK to enable durable workflows on top of just Postgres — no extension needed.

Durable Execution 101

Durable execution (or durable workflows) is a way to run long-lived, reliable functions that can survive crashes, restarts, and network failures without losing state or duplicating work. Durable execution can be thought of as the combination of a queue system and a state store that remembers the most recently seen execution state.

Because Postgres is excellent at queues thanks to SELECT ... FOR UPDATE SKIP LOCKED, you can use it for the queue (e.g., with pgmq). And because it’s a database, you can also use it to store the state.

The state is important. With durable execution, instead of running your logic in memory, the goal is to decompose a task into smaller pieces (step functions) and record every step and decision. When the process stops (whether it fails, intentionally suspends, or a machine dies) the engine can replay those events to restore the exact state and continue where it left off, as if nothing happened.

Absurd At A High Level

Absurd at the core is a single .sql file (absurd.sql) which needs to be applied to a database of your choice. That SQL file’s goal is to move the complexity of SDKs into the database. SDKs then make the system convenient by abstracting the low-level operations in a way that leverages the ergonomics of the language you are working with.

The system is very simple: A task dispatches onto a given queue from where a worker picks it up to work on. Tasks are subdivided into steps, which are executed in sequence by the worker. Tasks can be suspended or fail, and when that happens, they execute again (a run). The result of a step is stored in the database (a checkpoint). To avoid repeating work, checkpoints are automatically loaded from the state storage in Postgres again.

Additionally, tasks can sleep or suspend for events and wait until they are emitted. Events are cached, which means they are race-free.

With Agents

What is the relationship of agents with workflows? Normally, workflows are DAGs defined by a human ahead of time. AI agents, on the other hand, define their own adventure as they go. That means they are basically a workflow with mostly a single step that iterates over changing state until it determines that it has completed. Absurd enables this by automatically counting up steps if they are repeated:

absurd.registerTask({name: "my-agent"}, async (params, ctx) => {
  let messages = [{role: "user", content: params.prompt}];
  let step = 0;
  while (step++ < 20) {
    const { newMessages, finishReason } = await ctx.step("iteration", async () => {
      return await singleStep(messages);
    });
    messages.push(...newMessages);
    if (finishReason !== "tool-calls") {
      break;
    }
  }
});

This defines a single task named my-agent, and it has just a single step. The return value is the changed state, but the current state is passed in as an argument. Every time the step function is executed, the data is looked up first from the checkpoint store. The first checkpoint will be iteration, the second iteration#2, iteration#3, etc. Each state only stores the new messages it generated, not the entire message history.

If a step fails, the task fails and will be retried. And because of checkpoint storage, if you crash in step 5, the first 4 steps will be loaded automatically from the store. Steps are never retried, only tasks.

How do you kick it off? Simply enqueue it:

await absurd.spawn("my-agent", {
  prompt: "What's the weather like in Boston?"
}, {
  maxAttempts: 3,
});

And if you are curious, this is an example implementation of the singleStep function used above:

Single step function

async function singleStep(messages) {
  const result = await generateText({
    model: anthropic("claude-haiku-4-5"),
    system: "You are a helpful agent",
    messages,
    tools: {
      getWeather: { /* tool definition here */ }
    },
  });

  const newMessages = (await result.response).messages;
  const finishReason = await result.finishReason;

  if (finishReason === "tool-calls") {
    const toolResults = [];
    for (const toolCall of result.toolCalls) {
      /* handle tool calls here */
      if (toolCall.toolName === "getWeather") {
        const toolOutput = await getWeather(toolCall.input);
        toolResults.push({
          toolName: toolCall.toolName,
          toolCallId: toolCall.toolCallId,
          type: "tool-result",
          output: {type: "text", value: toolOutput},
        });
      }
    }
    newMessages.push({
      role: "tool",
      content: toolResults
    });
  }

  return { newMessages, finishReason };
}

Events and Sleeps

And like Temporal and other solutions, you can yield if you want. If you want to come back to a problem in 7 days, you can do so:

await ctx.sleep(60 * 60 * 24 * 7); // sleep for 7 days

Or if you want to wait for an event:

const eventName = `email-confirmation-${userId}`;
try {
  const payload = await ctx.waitForEvent(eventName, {timeout: 60 * 5});
  // handle event payload
} catch (e) {
  if (e instanceof TimeoutError) {
    // handle timeout
  } else {
    throw e;
  }
}

Which someone else can emit:

const eventName = `email-confirmation-${userId}`;
await absurd.emitEvent(eventName, { confirmedAt: new Date().toISOString() });

That’s it!

Really, that’s it. There is really not much to it. It’s just a queue and a state store — that’s all you need. There is no compiler plugin and no separate service or whole runtime integration. Just Postgres. That’s not to throw shade on these other solutions; they are great. But not every problem necessarily needs to scale to that level of complexity, and you can get quite far with much less. Particularly if you want to build software that other people should be able to self-host, that might be quite appealing.

It’s named Absurd because durable workflows are absurdly simple, but have been overcomplicated in recent years.↩

Regulation Isn’t the European Trap — Resignation Is

2025-10-21 08:00:00

Plenty has been written about how hard it is to build in Europe versus the US. The list is always the same with little process: brittle politics, dense bureaucracy, mandatory notaries, endless and rigid KYC and AML processes. Fine. I know, you know.

I’m not here to add another complaint to the pile (but if we meet over a beer or coffee, I’m happy to unload a lot of hilarious anecdotes on you). The unfortunate reality is that most of these constraints won’t change in my lifetime and maybe ever. Europe is not culturally aligned with entrepreneurship, it’s opposed to the idea of employee equity, and our laws reflect that.

What bothers me isn’t the rules — it’s the posture that develops from it within people that should know better. Across the system, everyone points at someone else. If a process takes 10 steps, you’ll find 10 people who feel absolved of responsibility because they can cite 9 other blockers. Friction becomes a moral license to do a mediocre job (while lamenting about it).

The vibe is: “Because the system is slow, I can be slow. Because there are rules, I don’t need judgment. Because there’s risk, I don’t need initiative.” And then we all nod along and nothing moves.

There are excellent people here; I’ve worked with them. But they are fighting upstream against a default of low agency. When the process is bad, too many people collapse into it. Communication narrows to the shortest possible message. Friday after 2pm, the notary won’t reply — and the notary surely will blame labor costs or regulation for why service ends there. The bank will cite compliance for why they don’t need to do anything. The registrar will point at some law that allows them to demand a translation of a document by a court appointed translator. Everyone has a reason. No one owns the outcome.

Meanwhile, in the US, our counsel replies when it matters, even after hours. Bankers answer the same day. The instinct is to enable progress, not enumerate reasons you can’t have it. The goal is the outcome and the rules are constraints to navigate, not a shield to hide behind.

So what’s the point? I can’t fix politics. What I can do: act with agency, and surround myself with people who do the same and speak in support of it. Work with those who start from “how do we make this work?” not “why this can’t work.” Name the absurdities without using them as cover. Be transparent, move anyway and tell people.

Nothing stops a notary from designing an onboarding flow that gets an Austrian company set up in five days — standardized KYC packets, templated resolutions, scheduled signing slots, clear checklists, async updates, a bias for same-day feedback. That could exist right now. It rarely does or falls short.

Yes, much in Europe is objectively worse for builders. We have to accept it. Then squeeze everything you can from what is in your control:

Own the handoff. When you’re step 3 of 10, behave like step 10 depends on you and behave like you control all 10 steps. Anticipate blockers further down the line. Move same day. Eliminate ambiguity. Close loops.
Default to clarity. Send checklists. Preempt the next two questions. Reduce the number of touches.
Model urgency without theatrics. Be calm, fast, and precise. Don’t make your customer chase you.
Use judgment. Rules exist and we can’t break them all. But we can work with them and be guided by them.

Select for agency. Choose partners who answer promptly when it’s material and who don’t confuse process with progress.

The trap is not only regulation. It’s the learned helplessness it breeds. If we let friction set our standards, we become the friction. We won’t legislate our way to a US-style environment anytime soon. But we don’t need permission to be better operators inside a bad one.

That’s the contrast and it’s the part we control.

Postscript: Comparing Europe to the US triggers people and I’m concious of that. Maturity is holding two truths at once: they do some things right and some things wrong and so do we. You don’t win by talking others down or praying for their failure. I’d rather see both Europe and the US succeed than celebrate Europe failing slightly less.

And no, saying I feel gratitude and happiness when I get a midnight reply doesn’t make me anti-work-life balance (I am not). It means when something is truly time-critical, fast, clear action lifts everyone. The times someone sent a document in minutes, late at night, both sides felt good about it when it mattered. Responsiveness, used with judgment, is not exploitation; it’s respect for outcomes and the relationships we form.

Building an Agent That Leverages Throwaway Code

2025-10-17 08:00:00

In August I wrote about my experiments with replacing MCP (Model Context Protocol) with code. In the time since I utilized that idea for exploring non-coding agents at Earendil. And I’m not alone! In the meantime, multiple people have explored this space and I felt it was worth sharing some updated findings. The general idea is pretty simple. Agents are very good at writing code, so why don’t we let them write throw-away code to solve problems that are not related to code at all?

I want to show you how and what I’m doing to give you some ideas of what works and why this is much simpler than you might think.

Pyodide is the Dark Horse

The first thing you have to realize is that Pyodide is secretly becoming a pretty big deal for a lot of agentic interactions. What is Pyodide? Pyodide is an open source project that makes a standard Python interpreter available via a WebAssembly runtime. What is neat about it is that it has an installer called micropip that allows it to install dependencies from PyPI. It also targets the emscripten runtime environment, which means there is a pretty good standard Unix setup around the interpreter that you can interact with.

Getting Pyodide to run is shockingly simple if you have a Node environment. You can directly install it from npm. What makes this so cool is that you can also interact with the virtual file system, which allows you to create a persistent runtime environment that interacts with the outside world. You can also get hosted Pyodide at this point from a whole bunch of startups, but you can actually get this running on your own machine and infrastructure very easily if you want to.

The way I found this to work best is if you banish Pyodide into a web worker. This allows you to interrupt it in case it runs into time limits.

A big reason why Pyodide is such a powerful runtime, is because Python has an amazing ecosystem of well established libraries that the models know about. From manipulating PDFs or word documents, to creating images, it’s all there.

File Systems Are King

Another vital ingredient to a code interpreter is having a file system.

Not just any file system though. I like to set up a virtual file system that I intercept so that I can provide it with access to remote resources from specific file system locations. For instance, you can have a folder on the file system that exposes files which are just resources that come from your own backend API. If the agent then chooses to read from those files, you can from outside the sandbox make a safe HTTP request to bring that resource into play. The sandbox itself does not have network access, so it’s only the file system that gates access to resources.

The reason the file system is so good is that agents just know so much about how they work, and you can provide safe access to resources through some external system outside of the sandbox. You can provide read-only access to some resources and write access to others, then access the created artifacts from the outside again.

Now actually doing that is a tad tricky because the emscripten file system is sync, and most of the interesting things you can do are async. The option that I ended up going with is to move the fetch-like async logic into another web worker and use Atomics.wait to block. If your entire Pyodide runtime is in a web worker, that’s not as bad as it looks.

That said, I wish the emscripten file system API was changed to support stack swiching instead of this. While it’s now possible to hide async promises behind sync abstractions within Pyodide with call_sync, the same approach does not work for the emscripten JavaScript FS API.

I have a full example of this at the end, but the simplified pseudocode that I ended up with looks like this:

// main thread: wrap a worker so fetch() looks synchronous
fetch(url) {
  const signalBuffer = new SharedArrayBuffer(4);
  const signal = new Int32Array(signalBuffer);
  const { port1, port2 } = new MessageChannel();
  this.worker.postMessage({url, signalBuffer, port: port2}, [port2]);

  Atomics.wait(signal, 0, 0);                   // park until worker flips the signal
  const message = receiveMessageOnPort(port1);  // MessageChannel gives the payload
  port1.close();

  if (message.message.status !== "ok") {
    throw new Error(message.message.error.message);
  }
  return message.message.data;
}

// worker thread: perform async fetch, then wake the main thread
parentPort.on("message", async ({ url, signalBuffer, port }) => {
  const signal = new Int32Array(signalBuffer);
  try {
    const bytes = await fetch(url).then(r => {
      if (!r.ok) throw new Error(`HTTP ${r.status}`);
      return r.arrayBuffer();
    });
    port.postMessage({ status: "ok", data: new Uint8Array(bytes) });
    Atomics.store(signal, 0, 1);          // mark success
  } catch (error) {
    port.postMessage({ status: "error", error: serialize(error) });
    Atomics.store(signal, 0, -1);         // mark failure
  } finally {
    Atomics.notify(signal, 0);            // unblock the waiting main thread
    port.close();
  }
});

Durable Execution

Lastly now that you have agents running, you really need durable execution. I would describe durable execution as the idea of being able to retry a complex workflow safely without losing progress. The reason for this is that agents can take a very long time, and if they interrupt, you want to bring them back to the state they were in. This has become a pretty hot topic. There are a lot of startups in that space and you can buy yourself a tool off the shelf if you want to.

What is a little bit disappointing is that there is no truly simple durable execution system. By that I mean something that just runs on top of Postgres and/or Redis in the same way as, for instance, there is pgmq. (Update: we have since built our own solution to this problem called Absurd)

The easiest way to shoehorn this yourself is to use queues to restart your tasks and to cache away the temporary steps from your execution. Basically, you compose your task from multiple steps and each of the steps just has a very simple cache key. It’s really just that simple:

function myAgenticLoop(taskID, initialState) {
  let stepCount = 0;
  let state = initialState;
  while (stepCount < MAX_STEPS) {
    let cacheKey = `${taskID}:${stepCount}`;
    let cachedState = loadStateFromCache(cacheKey);
    if (cachedState !== null) {
      state = cachedState.state;
    } else {
      state = runAgenticStep(state);
      storeStateInCache(cacheKey, state);
    }
    stepCount++;
    if (reachedEndCondition(state)) {
      break;
    }
  }
  return state;
}

You can improve on this greatly, but this is the general idea. The state is basically the conversation log and whatever else you need to keep around for the tool execution (e.g., whatever was thrown on the file system).

What Other Than Code?

What tools does an agent need that are not code? Well, the code needs to be able to do something interesting so you need to give it access to something. The most interesting access you can provide is via the file system, as mentioned. But there are also other tools you might want to expose. What Cloudflare proposed is connecting to MCP servers and exposing their tools to the code interpreter. I think this is a quite interesting approach and to some degree it’s probably where you want to go.

Some tools that I find interesting:

Describe: a tool that just lets the agent run more inference, mostly with files that the code interpreter generated. For instance if you have a zip file it’s quite fun to see the code interpreter use Python to unpack it. But if then that unpacked file is a jpg, you will need to go back to inference to understand it.
Help: a tool that just … brings up help. Again, can be with inference for basic RAG, or similar. I found it quite interesting to let the AI ask it for help. For example, you want the manual tool to allow a query like “Which Python code should I write to create a chart for the given XLSX file?” On the other hand, you can also just stash away some instructions in .md files on the virtual file system and have the code interpreter read it. It’s all an option.

Putting it Together

If you want to see what this roughly looks like, I vibe-coded a simple version of this together. It uses a made-up example but it does show how a sandbox with very little tool availability can create surprising results: mitsuhiko/mini-agent.

When you run it, it looks up the current IP from a special network drive that triggers an async fetch, and then it (usually) uses pillow or matplotlib to make an image of that IP address. Pretty pointless, but a lot of fun!

What an agent run looks like

Prompt: Figure out the current ip address and make me a picture of it.

Step 1:
Stop reason: tool_use
Tool call: execute_python {
  code: '\n' +
    '# Read the current IP address\n' +
    "with open('/network/current-ip', 'r') as f:\n" +
    '    ip_address = f.read().strip()\n' +
    '    \n' +
    'print(f"Current IP Address: {ip_address}")\n'
}
Tool result: {
  stdout: 'Current IP Address: 255.255.255.255',
  stderr: '',
  success: true
}
[Cache] Saved state to: agent-cache/task-0-step-1.json

Step 2:
Stop reason: tool_use
Tool call: execute_python {
  code: '\n' +
    'import matplotlib.pyplot as plt\n' +
    'import matplotlib.patches as patches\n' +
    'from matplotlib.patches import FancyBboxPatch\n' +
    'import numpy as np\n' +
    '\n' +
    '# Create figure and axis\n' +
    'fig, ax = plt.subplots(1, 1, figsize=(12, 8))\n' +
    'ax.set_xlim(0, 10)\n' +
    'ax.set_ylim(0, 10)\n' +
    "ax.axis('off')\n" +
    '\n' +
    '# Add background gradient\n' +
    'gradient = np.linspace(0, 1, 256).reshape(1, -1)\n' +
    'gradient = np.vstack([gradient] * 256)\n' +
    "ax.imshow(gradient, extent=[0, 10, 0, 10], aspect='auto', cmap='cool', alpha=0.3)\n" +
    '\n' +
    '# Title\n' +
    "ax.text(5, 9, 'Current IP Address', fontsize=28, weight='bold', \n" +
    "        ha='center', va='top', color='#1a1a1a')\n" +
    '\n' +
    '# IP Address Box\n' +
    'fancy_box = FancyBboxPatch((0.5, 4.5), 9, 2.5, \n' +
    '                           boxstyle="round,pad=0.2", \n' +
    "                           edgecolor='#2196F3', facecolor='#E3F2FD',\n" +
    '                           linewidth=3)\n' +
    'ax.add_patch(fancy_box)\n' +
    '\n' +
    '# IP Address Text\n' +
    "ax.text(5, 5.75, ip_address, fontsize=48, weight='bold', \n" +
    "        ha='center', va='center', color='#1976D2', family='monospace')\n" +
    '\n' +
    '# Decorative elements - network nodes\n' +
    'circle_positions = [(2, 2), (8, 2), (2, 8), (8, 8), (5, 1), (5, 9)]\n' +
    'for pos in circle_positions:\n' +
    "    circle = patches.Circle(pos, 0.3, color='#FF5722', alpha=0.7)\n" +
    '    ax.add_patch(circle)\n' +
    '\n' +
    '# Connection lines\n' +
    'for i in range(len(circle_positions)):\n' +
    '    for j in range(i+1, len(circle_positions)):\n' +
    '        x = [circle_positions[i][0], circle_positions[j][0]]\n' +
    '        y = [circle_positions[i][1], circle_positions[j][1]]\n' +
    "        ax.plot(x, y, color='#2196F3', alpha=0.2, linewidth=1)\n" +
    '\n' +
    '# Info text\n' +
    "ax.text(5, 3.8, '🌐 Your Network Location 🌐', fontsize=14, \n" +
    "        ha='center', va='top', color='#555', style='italic')\n" +
    '\n' +
    'plt.tight_layout()\n' +
    "plt.savefig('/output/ip_address.png', dpi=150, bbox_inches='tight', facecolor='white')\n" +
    'print("✅ Picture saved to /output/ip_address.png")\n' +
    'plt.close()\n' +
    '\n' +
    'print(f"\\n📍 Your IP Address: {ip_address}")\n'
}
Tool result: {
  stdout: 'Loading Pillow, contourpy, cycler, fonttools, kiwisolver, matplotlib, numpy, packaging, pyparsing, python-dateutil, pytz, sixLoaded Pillow, contourpy, cycler, fonttools, kiwisolver, matplotlib, numpy, packaging, pyparsing, python-dateutil, pytz, six',
  stderr: 'Traceback (most recent call last):\n' +
    '  File "/lib/python313.zip/_pyodide/_base.py", line 597, in eval_code_async\n' +
    '    await CodeRunner(\n' +
    '    ...<9 lines>...\n' +
    '    .run_async(globals, locals)\n' +
    '  File "/lib/python313.zip/_pyodide/_base.py", line 411, in run_async\n' +
    '    coroutine = eval(self.code, globals, locals)\n' +
    '  File "<exec>", line 8, in <module>\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 1613, in subplots\n' +
    '    fig = figure(**fig_kw)\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 934, in figure\n' +
    '    manager = new_figure_manager(\n' +
    '        num, figsize=figsize, dpi=dpi,\n' +
    '        facecolor=facecolor, edgecolor=edgecolor, frameon=frameon,\n' +
    '        FigureClass=FigureClass, **kwargs)\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 464, in new_figure_manager\n' +
    '    _warn_if_gui_out_of_main_thread()\n' +
    '    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 441, in _warn_if_gui_out_of_main_thread\n' +
    '    canvas_class = cast(type[FigureCanvasBase], _get_backend_mod().FigureCanvas)\n' +
    '                                                ~~~~~~~~~~~~~~~~^^\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 280, in _get_backend_mod\n' +
    '    switch_backend(rcParams._get("backend"))  # type: ignore[attr-defined]\n' +
    '    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 342, in switch_backend\n' +
    '    module = importlib.import_module(cbook._backend_module_name(newbackend))\n' +
    '  File "/lib/python313.zip/importlib/__init__.py", line 88, in import_module\n' +
    '    return _bootstrap._gcd_import(name[level:], package, level)\n' +
    '           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n' +
    '  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import\n' +
    '  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load\n' +
    '  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked\n' +
    '  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked\n' +
    '  File "<frozen importlib._bootstrap_external>", line 1026, in exec_module\n' +
    '  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/backends/backend_webagg.py", line 23, in <module>\n' +
    '    from js import document\n' +
    "ImportError: cannot import name 'document' from 'js' (unknown location)\n",
  success: false
}
[Cache] Saved state to: agent-cache/task-0-step-2.json

Step 3:
Stop reason: tool_use
Tool call: execute_python {
  code: '\n' +
    'from PIL import Image, ImageDraw, ImageFont\n' +
    'import os\n' +
    '\n' +
    '# Read the current IP address\n' +
    "with open('/network/current-ip', 'r') as f:\n" +
    '    ip_address = f.read().strip()\n' +
    '\n' +
    '# Create a new image with a nice gradient-like background\n' +
    'width, height = 1200, 800\n' +
    "image = Image.new('RGB', (width, height), color='white')\n" +
    'draw = ImageDraw.Draw(image)\n' +
    '\n' +
    '# Draw background with gradient effect using rectangles\n' +
    'for i in range(height):\n' +
    '    ratio = i / height\n' +
    '    r = int(30 + (100 * ratio))\n' +
    '    g = int(100 + (150 * ratio))\n' +
    '    b = int(200 + (55 * ratio))\n' +
    '    draw.rectangle([(0, i), (width, i+1)], fill=(r, g, b))\n' +
    '\n' +
    '# Draw main box\n' +
    'box_margin = 100\n' +
    'box_top = 250\n' +
    'box_height = 300\n' +
    'draw.rectangle(\n' +
    '    [(box_margin, box_top), (width - box_margin, box_top + box_height)],\n' +
    '    fill=(240, 248, 255),\n' +
    '    outline=(33, 150, 243),\n' +
    '    width=5\n' +
    ')\n' +
    '\n' +
    '# Draw rounded corners effect\n' +
    'for i in range(20):\n' +
    '    draw.arc([(box_margin, box_top), (box_margin + 40, box_top + 40)], \n' +
    '             0, 90, fill=(33, 150, 243), width=5)\n' +
    '\n' +
    '# Add decorative network nodes\n' +
    'node_radius = 15\n' +
    'nodes = [(150, 150), (1050, 150), (150, 650), (1050, 650), (600, 100), (600, 700)]\n' +
    'for node in nodes:\n' +
    '    draw.ellipse(\n' +
    '        [(node[0] - node_radius, node[1] - node_radius),\n' +
    '         (node[0] + node_radius, node[1] + node_radius)],\n' +
    '        fill=(255, 87, 34),\n' +
    '        outline=(200, 50, 0),\n' +
    '        width=2\n' +
    '    )\n' +
    '\n' +
    '# Draw connection lines\n' +
    'for i, node1 in enumerate(nodes):\n' +
    '    for node2 in nodes[i+1:]:\n' +
    '        draw.line([node1, node2], fill=(33, 150, 243, 50), width=1)\n' +
    '\n' +
    '# Try to use a large default font\n' +
    'try:\n' +
    '    title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 60)\n' +
    '    ip_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuMono-Bold.ttf", 90)\n' +
    '    subtitle_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 30)\n' +
    'except:\n' +
    '    title_font = ImageFont.load_default()\n' +
    '    ip_font = ImageFont.load_default()\n' +
    '    subtitle_font = ImageFont.load_default()\n' +
    '\n' +
    '# Draw title\n' +
    'draw.text((width // 2, 80), "Current IP Address", font=title_font, \n' +
    '          fill=(255, 255, 255), anchor="mm")\n' +
    '\n' +
    '# Draw IP address\n' +
    'draw.text((width // 2, box_top + box_height // 2), ip_address, font=ip_font, \n' +
    '          fill=(25, 118, 210), anchor="mm")\n' +
    '\n' +
    '# Draw subtitle\n' +
    'draw.text((width // 2, box_top + box_height + 50), "🌐 Your Network Location 🌐", \n' +
    '          font=subtitle_font, fill=(100, 100, 100), anchor="mm")\n' +
    '\n' +
    '# Save the image\n' +
    "output_path = '/output/ip_address.png'\n" +
    'image.save(output_path)\n' +
    'print(f"✅ Picture saved to {output_path}")\n' +
    'print(f"\\n📍 Your IP Address: {ip_address}")\n'
}
Tool result: {
  stdout: 'Pillow already loaded from default channelNo new packages to load✅ Picture saved to /output/ip_address.png📍 Your IP Address: 255.255.255.255',
  stderr: '',
  success: true
}
[Cache] Saved state to: agent-cache/task-0-step-3.json

Step 4:
Stop reason: end_turn

Final result: Perfect! 🎉 I've successfully:

1. **Found your IP address**: **255.255.255.255**
2. **Created a picture** showing your IP address with:
   - A nice gradient blue background
   - Your IP address prominently displayed in the center
   - Decorative network nodes and connection lines
   - Professional styling and formatting

The image has been saved to `/output/ip_address.png` and is ready for you to download!
[Cache] Saved state to: agent-cache/task-0-step-4.json

Total steps: 4

Making 1 file(s) available in ./output:
  ✓ ip_address.png

4he same approach has also been leveraged by Anthropic and Cloudflare. There is some further reading that might give you more ideas:

Claude Skills is fully leveraging code generation for working with documents or other interesting things. Comes with a (non Open Source) repository of example skills that the LLM and code executor can use: anthropics/skills
Cloudflare’s Code Mode which is the idea of creating TypeScript bindings for MCP tools and having the agent write code to use them in a sandbox.

90%

2025-09-29 08:00:00

“I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code”

— Dario Amodei

Three months ago I said that AI changes everything. I came to that after plenty of skepticism. There are still good reasons to doubt that AI will write all code, but my current reality is close.

For the infrastructure component I started at my new company, I’m probably north of 90% AI-written code. I don’t want to convince you — just share what I learned. In parts, because I approached this project differently from my first experiments with AI-assisted coding.

The service is written in Go with few dependencies and an OpenAPI-compatible REST API. At its core, it sends and receives emails. I also generated SDKs for Python and TypeScript with a custom SDK generator. In total: about 40,000 lines, including Go, YAML, Pulumi, and some custom SDK glue.

I set a high bar, especially that I can operate it reliably. I’ve run similar systems before and knew what I wanted.

Setting it in Context

Some startups are already near 100% AI-generated. I know, because many build in the open and you can see their code. Whether that works long-term remains to be seen. I still treat every line as my responsibility, judged as if I wrote it myself. AI doesn’t change that.

There are no weird files that shouldn’t belong there, no duplicate implementations, and no emojis all over the place. The comments still follow the style I want and, crucially, often aren’t there. I pay close attention to the fundamentals of system architecture, code layout, and database interaction. I’m incredibly opinionated. As a result, there are certain things I don’t let the AI do. I know it won’t reach the point where I could sign off on a commit. That’s why it’s not 100%.

As contrast: another quick prototype we built is a mess of unclear database tables, markdown file clutter in the repo, and boatloads of unwanted emojis. It served its purpose — validate an idea — but wasn’t built to last, and we had no expectation to that end.

Foundation Building

I began in the traditional way: system design, schema, architecture. At this state I don’t let the AI write, but I loop it in AI as a kind of rubber duck. The back-and-forth helps me see mistakes, even if I don’t need or trust the answers.

I did get the foundation wrong once. I initially argued myself into a more complex setup than I wanted. That’s a part where I later used the LLM to redo a larger part early and clean it up.

For AI-generated or AI-supported code, I now end up with a stack that looks something like something I often wanted, but was too hard to do by hand:

Raw SQL: This is probably the biggest change to how I used to write code. I really like using an ORM, but I don’t like some of its effects. In particular, once you approach the ORM’s limits, you’re forced to switch to handwritten SQL. That mapping is often tedious because you lose some of the powers the ORM gives you. Another consequence is that it’s very hard to find the underlying queries, which makes debugging harder. Seeing the actual SQL in your code and in the database log is powerful. You always lose that with an ORM.

The fact that I no longer have to write SQL because the AI does it for me is a game changer.

I also use raw SQL for migrations now.
OpenAPI first: I tried various approaches here. There are many frameworks you can use. I ended up first generating the OpenAPI specification and then using code generation from there to the interface layer. This approach works better with AI-generated code. The OpenAPI specification is now the canonical one that both clients and server shim is based on.

Iteration

Today I use Claude Code and Codex. Each has strengths, but the constant is Codex for code review after PRs. It’s very good at that. Claude is indispensable still when debugging and needing a lot of tool access (eg: why do I have a deadlock, why is there corrupted data in the database etc.). The working together of the two is where it’s most magical. Claude might find the data, Codex might understand it better.

I cannot stress enough how bad the code from these agents can be if you’re not careful. While they understand system architecture and how to build something, they can’t keep the whole picture in scope. They will recreate things that already exist. They create abstractions that are completely inappropriate for the scale of the problem.

You constantly need to learn how to bring the right information to the context. For me, this means pointing the AI to existing implementations and giving it very specific instructions on how to follow along.

I generally create PR-sized chunks that I can review. There are two paths to this:

Agent loop with finishing touches: Prompt until the result is close, then clean up.
Lockstep loop: Earlier I went edit by edit. Now I lean on the first method most of the time, keeping a todo list for cleanups before merge.

It requires intuition to know when each approach is more likely to lead to the right results. Familiarity with the agent also helps understanding when a task will not go anywhere, avoiding wasted cycles.

Where It Fails

The most important piece of working with an agent is the same as regular software engineering. You need to understand your state machines, how the system behaves at any point in time, your database.

It is easy to create systems that appear to behave correctly but have unclear runtime behavior when relying on agents. For instance, the AI doesn’t fully comprehend threading or goroutines. If you don’t keep the bad decisions at bay early it, you won’t be able to operate it in a stable manner later.

Here’s an example: I asked it to build a rate limiter. It “worked” but lacked jitter and used poor storage decisions. Easy to fix if you know rate limiters, dangerous if you don’t.

Agents also operate on conventional wisdom from the internet and in tern do things I would never do myself. It loves to use dependencies (particularly outdated ones). It loves to swallow errors and take away all tracebacks. I’d rather uphold strong invariants and let code crash loudly when they fail, than hide problems. If you don’t fight this, you end up with opaque, unobservable systems.

Where It Shines

For me, this has reached the point where I can’t imagine working any other way. Yes, I could probably have done it without AI. But I would have built a different system in parts because I would have made different trade-offs. This way of working unlocks paths I’d normally skip or defer.

Here are some of the things I enjoyed a lot on this project:

Research + code, instead of research and code later: Some things that would have taken me a day or two to figure out now take 10 to 15 minutes.
It allows me to directly play with one or two implementations of a problem. It moves me from abstract contemplation to hands on evaluation.
Trying out things: I tried three different OpenAPI implementations and approaches in a day.
Constant refactoring: The code looks more organized than it would otherwise have been because the cost of refactoring is quite low. You need to know what you do, but if set up well, refactoring becomes easy.
Infrastructure: Claude got me through AWS and Pulumi. Work I generally dislike became a few days instead of weeks. It also debugged the setup issues as it was going through them. I barely had to read the docs.
Adopting new patterns: While they suck at writing tests, they turned out great at setting up test infrastructure I didn’t know I needed. I got a recommendation on Twitter to use testcontainers for testing against Postgres. The approach runs migrations once and then creates database clones per test. That turns out to be super useful. It would have been quite an involved project to migrate to. Claude did it in an hour for all tests.
SQL quality: It writes solid SQL I could never remember. I just need to review which I can. But to this day I suck at remembering MERGE and WITH when writing it.

What does it mean?

Is 90% of code going to be written by AI? I don’t know. What I do know is, that for me, on this project, the answer is already yes. I’m part of that growing subset of developers who are building real systems this way.

At the same time, for me, AI doesn’t own the code. I still review every line, shape the architecture, and carry the responsibility for how it runs in production. But the sheer volume of what I now let an agent generate would have been unthinkable even six months ago.

That’s why I’m convinced this isn’t some far-off prediction. It’s already here — just unevenly distributed — and the number of developers working like this is only going to grow.

That said, none of this removes the need to actually be a good engineer. If you let the AI take over without judgment, you’ll end up with brittle systems and painful surprises (data loss, security holes, unscalable software). The tools are powerful, but they don’t absolve you of responsibility.

What’s a Foreigner?

2025-09-14 08:00:00

Across many countries, resistance to immigration is rising — even places with little immigration, like Japan, now see rallies against it. I’m not going to take a side here. I want to examine a simpler question: who do we mean when we say “foreigner”?

I would argue there isn’t a universal answer. Laws differ, but so do social definitions. In Vienna, where I live, immigration is visible: roughly half of primary school children don’t speak German at home. Austria makes citizenship hard to obtain. Many people born here aren’t citizens; at the same time, EU citizens living here have broad rights and labor-market access similar to native Austrians. Over my lifetime, the fear of foreigners has shifted: once aimed at nearby Eastern Europeans, it now falls more on people from outside the EU, often framed through religion or culture. Practically, “foreigner” increasingly ends up meaning “non-EU.” Keep in mind that over the last 30 years the EU went from 12 countries to 27. That’s a signifcant increase in social mobility.

I believe this is quite different from what is happening in the United States. The present-day US debate is more tightly tied to citizenship and allegiance, which is partly why current fights there include attempts to narrow who gets citizenship at birth. The worry is less about which foreigners come and more about the terms of becoming American and whether newcomers will embrace what some define as American values.

Inside the EU, the concept of EU citizenship changes social reality. Free movement, aligned standards, interoperable social systems, and easier labor mobility make EU citizens feel less “foreign” to each other — despite real frictions. The UK before Brexit was a notable exception: less integrated in visible ways and more hostile to Central and Eastern European workers. Perhaps another sign that the level of integration matters. In practical terms, allegiances are also much less clearly defined in the EU. There are people who live their entire live in other EU countries and whos allegiance is no longer clearly aligned to any one country.

Legal immigration itself is widely misunderstood. Most systems are both far more restrictive in some areas and far more permissive than people assume. On the one hand, what’s called “illegal” is often entirely lawful. Many who are considered “illegal” are legally awaiting pending asylum decisions or are accepted refugees. These are processes many think shouldn’t exist, but they are, in fact, legal. On the other hand, the requirements for non-asylum immigration are very high, and most citizens of a country themselves would not qualify for skilled immigration visas. Meanwhile, the notion that a country could simply “remove all foreigners” runs into practical and ethical dead ends. Mobility pressures aren’t going away; they’re reinforced by universities, corporations, individual employers, demographics, and geopolitics.

Citizenship is just a small wrinkle. In Austria, you generally need to pass a modest German exam and renounce your prior citizenship. That creates odd outcomes: native-born non-citizens who speak perfect German but lack a passport, and naturalized citizens who never fully learned the language. Legally clear, socially messy — and not unique to Austria. The high hurdle to obtaining a passport also leads many educated people to intentionally opt out of becoming citizens. The cost that comes with renouncing a passport is not to be underestimated.

Where does this leave us? The realities of international mobility leave our current categories of immigration straining and misaligned with what the population at large thinks immigration should look like. Economic anxiety, war, and political polarization are making some groups of foreigners targets, while the deeper drivers behind immigration will only keep intensifying.

Perhaps we need to admit that we’re all struggling with these questions. The person worried about their community or country changing too quickly and the immigrant seeking a better life are both responding to forces larger than themselves. In a world where capital moves freely but most people cannot, where climate change might soon displace millions, and where birth rates are collapsing in wealthy nations, our immigration systems will be tested and stressed, and our current laws and regulations are likely inadequate.

996

2025-09-04 08:00:00

“Amazing salary, hackerhouse in SF, crazy equity. 996. Our mission is OSS.” — Gregor Zunic

“The current vibe is no drinking, no drugs, 9-9-6, […].” — Daksh Gupta

“The truth is, China’s really doing ‘007’ now—midnight to midnight, seven days a week […] if you want to build a $10 billion company, you have to work seven days a week.” — Harry Stebbings

I love work. I love working late nights, hacking on things. This week I didn’t go to sleep before midnight once. And yet…

I also love my wife and kids. I love long walks, contemplating life over good coffee, and deep, meaningful conversations. None of this would be possible if my life was defined by 12 hour days, six days a week. More importantly, a successful company is not a sprint, it’s a marathon.

And this is when this is your own company! When you devote 72 hours a week to someone else’s startup, you need to really think about that arrangement a few times. I find it highly irresponsible for a founder to promote that model. As a founder, you are not an employee, and your risks and leverage are fundamentally different.

I will always advocate for putting the time in because it is what brought me happiness. Intensity, and giving a shit about what I’m doing, will always matter to me. But you don’t measure that by the energy you put in, or the hours you’re sitting in the office, but the output you produce. Burning out on twelve-hour days, six days a week, has no prize at the end. It’s unsustainable, it shouldn’t be the standard and it sure as hell should not be seen as a positive sign of a company.

I’ve pulled many all-nighters, and I’ve enjoyed them. I still do. But they’re enjoyable in the right context, for the right reasons, and when that is a completely personal choice, not the basis of company culture.

And that all-nighter? It comes with a fucked up and unproductive morning the day after.

When someone promotes a 996 work culture, we should push back.

Armin RonacherModify