MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

Introducing Claude Sonnet 4.6

2026-02-18 07:58:58

Introducing Claude Sonnet 4.6

Sonnet 4.6 is out today, and Anthropic claim it offers similar performance to November's Opus 4.5 while maintaining the Sonnet pricing of $3/million input and $15/million output tokens (the Opus models are $5/$25). Here's the system card PDF.

Sonnet 4.6 has a "reliable knowledge cutoff" of August 2025, compared to Opus 4.6's May 2025 and Haiku 4.5's February 2025. Both Opus and Sonnet default to 200,000 max input tokens but can stretch to 1 million in beta and at a higher cost.

I just released llm-anthropic 0.24 with support for both Sonnet 4.6 and Opus 4.6. Claude Code did most of the work - the new models had a fiddly amount of extra details around adaptive thinking and no longer supporting prefixes, as described in Anthropic's migration guide.

Here's what I got from:

uvx --with llm-anthropic llm 'Generate an SVG of a pelican riding a bicycle' -m claude-sonnet-4.6

The pelican has a jaunty top hat with a red band. There is a string between the upper and lower beaks for some reason. The bicycle frame is warped in the wrong way.

The SVG comments include:

<!-- Hat (fun accessory) -->

I tried a second time and also got a top hat. Sonnet 4.6 apparently loves top hats!

For comparison, here's the pelican Opus 4.5 drew me in November:

The pelican is cute and looks pretty good. The bicycle is not great - the frame is wrong and the pelican is facing backwards when the handlebars appear to be forwards.There is also something that looks a bit like an egg on the handlebars.

And here's Anthropic's current best pelican, drawn by Opus 4.6 on February 5th:

Slightly wonky bicycle frame but an excellent pelican, very clear beak and pouch, nice feathers.

Opus 4.6 produces the best pelican beak/pouch. I do think the top hat from Sonnet 4.6 is a nice touch though.

Via Hacker News

Tags: ai, generative-ai, llms, llm, anthropic, claude, llm-pricing, pelican-riding-a-bicycle, llm-release, claude-code

Rodney v0.4.0

2026-02-18 07:02:33

Rodney v0.4.0

My Rodney CLI tool for browser automation attracted quite the flurry of PRs since I announced it last week. Here are the release notes for the just-released v0.4.0:
  • Errors now use exit code 2, which means exit code 1 is just for for check failures. #15
  • New rodney assert command for running JavaScript tests, exit code 1 if they fail. #19
  • New directory-scoped sessions with --local/--global flags. #14
  • New reload --hard and clear-cache commands. #17
  • New rodney start --show option to make the browser window visible. Thanks, Antonio Cuni. #13
  • New rodney connect PORT command to debug an already-running Chrome instance. Thanks, Peter Fraenkel. #12
  • New RODNEY_HOME environment variable to support custom state directories. Thanks, Senko Rašić. #11
  • New --insecure flag to ignore certificate errors. Thanks, Jakub Zgoliński. #10
  • Windows support: avoid Setsid on Windows via build-tag helpers. Thanks, adm1neca. #18
  • Tests now run on windows-latest and macos-latest in addition to Linux.

I've been using Showboat to create demos of new features - here those are for rodney assert, rodney reload --hard, rodney exit codes, and rodney start --local.

The rodney assert command is pretty neat: you can now Rodney to test a web app through multiple steps in a shell script that looks something like this (adapted from the README):

#!/bin/bash
set -euo pipefail

FAIL=0

check() {
    if ! "$@"; then
        echo "FAIL: $*"
        FAIL=1
    fi
}

rodney start
rodney open "https://example.com"
rodney waitstable

# Assert elements exist
check rodney exists "h1"

# Assert key elements are visible
check rodney visible "h1"
check rodney visible "#main-content"

# Assert JS expressions
check rodney assert 'document.title' 'Example Domain'
check rodney assert 'document.querySelectorAll("p").length' '2'

# Assert accessibility requirements
check rodney ax-find --role navigation

rodney stop

if [ "$FAIL" -ne 0 ]; then
    echo "Some checks failed"
    exit 1
fi
echo "All checks passed"

Tags: browsers, projects, testing, annotated-release-notes, rodney

Quoting ROUGH DRAFT 8/2/66

2026-02-17 22:49:04

This is the story of the United Space Ship Enterprise. Assigned a five year patrol of our galaxy, the giant starship visits Earth colonies, regulates commerce, and explores strange new worlds and civilizations. These are its voyages... and its adventures.

ROUGH DRAFT 8/2/66, before the Star Trek opening narration reached its final form

Tags: screen-writing, science-fiction

First kākāpō chick in four years hatches on Valentine's Day

2026-02-17 22:09:43

First kākāpō chick in four years hatches on Valentine's Day

First chick of the 2026 breeding season!

Kākāpō Yasmine hatched an egg fostered from kākāpō Tīwhiri on Valentine's Day, bringing the total number of kākāpō to 237 – though it won’t be officially added to the population until it fledges.

Here's why the egg was fostered:

"Kākāpō mums typically have the best outcomes when raising a maximum of two chicks. Biological mum Tīwhiri has four fertile eggs this season already, while Yasmine, an experienced foster mum, had no fertile eggs."

And an update from conservation biologist Andrew Digby - a second chick hatched this morning!

The second #kakapo chick of the #kakapo2026 breeding season hatched this morning: Hine Taumai-A1-2026 on Ako's nest on Te Kākahu. We transferred the egg from Anchor two nights ago. This is Ako's first-ever chick, which is just a few hours old in this video.

That post has a video of mother and chick.

A beautiful charismatic green Kākāp feeding a little grey chick

Via MetaFilter

Tags: kakapo

Quoting Dimitris Papailiopoulos

2026-02-17 22:04:44

But the intellectually interesting part for me is something else. I now have something close to a magic box where I throw in a question and a first answer comes back basically for free, in terms of human effort. Before this, the way I'd explore a new idea is to either clumsily put something together myself or ask a student to run something short for signal, and if it's there, we’d go deeper. That quick signal step, i.e., finding out if a question has any meat to it, is what I can now do without taking up anyone else's time. It’s now between just me, Claude Code, and a few days of GPU time.

I don’t know what this means for how we do research long term. I don’t think anyone does yet. But the distance between a question and a first answer just got very small.

Dimitris Papailiopoulos, on running research questions though Claude Code

Tags: research, coding-agents, claude-code, generative-ai, ai, llms

Nano Banana Pro diff to webcomic

2026-02-17 12:51:58

Given the threat of cognitive debt brought on by AI-accelerated software development leading to more projects and less deep understanding of how they work and what they actually do, it's interesting to consider artifacts that might be able to help.

Nathan Baschez on Twitter:

my current favorite trick for reducing "cognitive debt" (h/t @simonw ) is to ask the LLM to write two versions of the plan:

  1. The version for it (highly technical and detailed)
  2. The version for me (an entertaining essay designed to build my intuition)

Works great

This inspired me to try something new. I generated the diff between v0.5.0 and v0.6.0 of my Showboat project - which introduced the remote publishing feature - and dumped that into Nano Banana Pro with the prompt:

Create a webcomic that explains the new feature as clearly and entertainingly as possible

Here's what it produced:

A six-panel comic strip illustrating a tool called "Showboat" for live-streaming document building. Panel 1, titled "THE OLD WAY: Building docs was a lonely voyage. You finished it all before anyone saw it.", shows a sad bearded man on a wooden boat labeled "THE LOCALHOST" holding papers and saying "Almost done... then I have to export and email the HTML...". Panel 2, titled "THE UPGRADE: Just set the environment variable!", shows the same man excitedly plugging in a device with a speech bubble reading "ENV VAR: SHOWBOAT_REMOTE_URL" and the sound effect "*KA-CHUNK!*". Panel 3, titled "init establishes the uplink and generates a unique UUID beacon.", shows the man typing at a keyboard with a terminal reading "$ showboat init 'Live Demo'", a satellite dish transmitting to a floating label "UUID: 550e84...", and a monitor reading "WAITING FOR STREAM...". Panel 4, titled "Every note and exec is instantly beamed to the remote viewer!", shows the man coding with sound effects "*HAMMER!*", "ZAP!", "ZAP!", "BANG!" as red laser beams shoot from a satellite dish to a remote screen displaying "NOTE: Step 1..." and "SUCCESS". Panel 5, titled "Even image files are teleported in real-time!", shows a satellite dish firing a cyan beam with the sound effect "*FOOMP!*" toward a monitor displaying a bar chart. Panel 6, titled "You just build. The audience gets the show live.", shows the man happily working at his boat while a crowd of cheering people watches a projected screen reading "SHOWBOAT LIVE STREAM: Live Demo", with a label "UUID: 550e84..." and one person in the foreground eating popcorn.

Good enough to publish with the release notes? I don't think so. I'm sharing it here purely to demonstrate the idea. Creating assets like this as a personal tool for thinking about novel ways to explain a feature feels worth exploring further.

Tags: nano-banana, gemini, llms, cognitive-debt, generative-ai, ai, text-to-image, showboat, ai-assisted-programming