MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

Quoting Nicholas Carlini

2025-11-20 09:01:44

Previously, when malware developers wanted to go and monetize their exploits, they would do exactly one thing: encrypt every file on a person's computer and request a ransome to decrypt the files. In the future I think this will change.

LLMs allow attackers to instead process every file on the victim's computer, and tailor a blackmail letter specifically towards that person. One person may be having an affair on their spouse. Another may have lied on their resume. A third may have cheated on an exam at school. It is unlikely that any one person has done any of these specific things, but it is very likely that there exists something that is blackmailable for every person. Malware + LLMs, given access to a person's computer, can find that and monetize it.

Nicholas Carlini, Are large language models worth it? Misuse: malware at scale

Tags: ai-ethics, generative-ai, nicholas-carlini, ai, llms

Building more with GPT-5.1-Codex-Max

2025-11-20 07:15:10

Building more with GPT-5.1-Codex-Max

Hot on the heels of yesterday's Gemini 3 Pro release comes a new model from OpenAI called GPT-5.1-Codex-Max.

(Remember when GPT-5 was meant to bring in a new era of less confusing model names? That didn't last!)

It's currently only available through their Codex CLI coding agent, where it's the new default model:

Starting today, GPT‑5.1-Codex-Max will replace GPT‑5.1-Codex as the default model in Codex surfaces. Unlike GPT‑5.1, which is a general-purpose model, we recommend using GPT‑5.1-Codex-Max and the Codex family of models only for agentic coding tasks in Codex or Codex-like environments.

It's not available via the API yet but should be shortly.

The timing of this release is interesting given that Gemini 3 Pro appears to have aced almost all of the benchmarks just yesterday. It's reminiscent of the period in 2024 when OpenAI consistently made big announcements that happened to coincide with Gemini releases.

OpenAI's self-reported SWE-Bench Verified score is particularly notable: 76.5% for thinking level "high" and 77.9% for the new "xhigh". That was the one benchmark where Gemini 3 Pro was out-performed by Claude Sonnet 4.5 - Gemini 3 Pro got 76.2% and Sonnet 4.5 got 77.2%. OpenAI now have the highest scoring model there by a full .7 of a percentage point!

They also report a score of 58.1% on Terminal Bench 2.0, beating Gemini 3 Pro's 54.2% (and Sonnet 4.5's 42.8%.)

The most intriguing part of this announcement concerns the model's approach to long context problems:

GPT‑5.1-Codex-Max is built for long-running, detailed work. It’s our first model natively trained to operate across multiple context windows through a process called compaction, coherently working over millions of tokens in a single task. [...]

Compaction enables GPT‑5.1-Codex-Max to complete tasks that would have previously failed due to context-window limits, such as complex refactors and long-running agent loops by pruning its history while preserving the most important context over long horizons. In Codex applications, GPT‑5.1-Codex-Max automatically compacts its session when it approaches its context window limit, giving it a fresh context window. It repeats this process until the task is completed.

There's a lot of confusion on Hacker News about what this actually means. Claude Code already does a version of compaction, automatically summarizing previous turns when the context runs out. Does this just mean that Codex-Max is better at that process?

I had it draw me a couple of pelicans by typing "Generate an SVG of a pelican riding a bicycle" directly into the Codex CLI tool. Here's thinking level medium:

A flat-style illustration shows a white, round-bodied bird with an orange beak pedaling a red-framed bicycle with thin black wheels along a sandy beach, with a calm blue ocean and clear sky in the background.

And here's thinking level "xhigh":

A plump white bird with an orange beak and small black eyes crouches low on a blue bicycle with oversized dark wheels, shown racing forward with motion lines against a soft gradient blue sky.

I also tried xhigh on the my longer pelican test prompt, which came out like this:

A stylized dark gray bird with layered wings, a yellow head crest, and a long brown beak leans forward in a racing pose on a black-framed bicycle, riding across a glossy blue surface under a pale sky.

Also today: GPT-5.1 Pro is rolling out today to all Pro users. According to the ChatGPT release notes:

GPT-5.1 Pro is rolling out today for all ChatGPT Pro users and is available in the model picker. GPT-5 Pro will remain available as a legacy model for 90 days before being retired.

That's a pretty fast deprecation cycle for the GPT-5 Pro model that was released just three months ago.

Via Hacker News

Tags: ai, openai, generative-ai, llms, evals, pelican-riding-a-bicycle, llm-release, gpt-5, codex-cli

How I automate my Substack newsletter with content from my blog

2025-11-20 06:00:34

I sent out my weekly-ish Substack newsletter this morning and took the opportunity to record a YouTube video demonstrating my process and describing the different components that make it work. There's a lot of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack.

The core process is the same as I described back in 2023. I have an Observable notebook called blog-to-newsletter which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button.

Screenshot of the interface. An item in a list says 9080: Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark. A huge button reads Copy rich text newsletter to clipboard - below is a smaller button that says Copy just the links/quotes/TILs. A Last X days slider is set to 2. There are checkboxes for SKip content sent in prior newsletters and only include post content prior to the cutoff comment.

I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes.

I make very minor edits:

  • I set the title and the subheading for the newsletter. This is often a direct copy of the title of the featured blog post.
  • Substack turns YouTube URLs into embeds, which often isn't what I want - especially if I have a YouTube URL inside a code example.
  • Blocks of preformatted text often have an extra blank line at the end, which I remove.
  • Occasionally I'll make a content edit - removing a piece of content that doesn't fit the newsletter, or fixing a time reference like "yesterday" that doesn't make sense any more.
  • I pick the featured image for the newsletter and add some tags.

That's the whole process!

The Observable notebook

The most important cell in the Observable notebook is this one:

raw_content = {
  return await (
    await fetch(
      `https://datasette.simonwillison.net/simonwillisonblog.json?sql=${encodeURIComponent(
        sql
      )}&_shape=array&numdays=${numDays}`
    )
  ).json();
}

This uses the JavaScript fetch() function to pull data from my blog's Datasette instance, using a very complex SQL query that is composed elsewhere in the notebook.

Here's a link to see and execute that query directly in Datasette. It's 143 lines of convoluted SQL that assembles most of the HTML for the newsletter using SQLite string concatenation! An illustrative snippet:

with content as (
  select
    id,
    'entry' as type,
    title,
    created,
    slug,
    '<h3><a href="' || 'https://simonwillison.net/' || strftime('%Y/', created)
      || substr('JanFebMarAprMayJunJulAugSepOctNovDec', (strftime('%m', created) - 1) * 3 + 1, 3) 
      || '/' || cast(strftime('%d', created) as integer) || '/' || slug || '/' || '">' 
      || title || '</a> - ' || date(created) || '</h3>' || body
      as html,
    'null' as json,
    '' as external_url
  from blog_entry
  union all
  # ...

My blog's URLs look like /2025/Nov/18/gemini-3/ - this SQL constructs that three letter month abbreviation from the month number using a substring operation.

This is a terrible way to assemble HTML, but I've stuck with it because it amuses me.

The rest of the Observable notebook takes that data, filters out anything that links to content mentioned in the previous newsletters and composes it into a block of HTML that can be copied using that big button.

Here's the recipe it uses to turn HTML into rich text content on a clipboard suitable for Substack. I can't remember how I figured this out but it's very effective:

Object.assign(
  html`<button style="font-size: 1.4em; padding: 0.3em 1em; font-weight: bold;">Copy rich text newsletter to clipboard`,
  {
    onclick: () => {
      const htmlContent = newsletterHTML;
      // Create a temporary element to hold the HTML content
      const tempElement = document.createElement("div");
      tempElement.innerHTML = htmlContent;
      document.body.appendChild(tempElement);
      // Select the HTML content
      const range = document.createRange();
      range.selectNode(tempElement);
      // Copy the selected HTML content to the clipboard
      const selection = window.getSelection();
      selection.removeAllRanges();
      selection.addRange(range);
      document.execCommand("copy");
      selection.removeAllRanges();
      document.body.removeChild(tempElement);
    }
  }
)

From Django+Postgresql to Datasette+SQLite

My blog itself is a Django application hosted on Heroku, with data stored in Heroku PostgreSQL. Here's the source code for that Django application. I use the Django admin as my CMS.

Datasette provides a JSON API over a SQLite database... which means something needs to convert that PostgreSQL database into a SQLite database that Datasette can use.

My system for doing that lives in the simonw/simonwillisonblog-backup GitHub repository. It uses GitHub Actions on a schedule that executes every two hours, fetching the latest data from PostgreSQL and converting that to SQLite.

My db-to-sqlite tool is responsible for that conversion. I call it like this:

db-to-sqlite \
  $(heroku config:get DATABASE_URL -a simonwillisonblog | sed s/postgres:/postgresql+psycopg2:/) \
  simonwillisonblog.db \
  --table auth_permission \
  --table auth_user \
  --table blog_blogmark \
  --table blog_blogmark_tags \
  --table blog_entry \
  --table blog_entry_tags \
  --table blog_quotation \
  --table blog_quotation_tags \
  --table blog_note \
  --table blog_note_tags \
  --table blog_tag \
  --table blog_previoustagname \
  --table blog_series \
  --table django_content_type \
  --table redirects_redirect

That heroku config:get DATABASE_URL command uses Heroku credentials in an environment variable to fetch the database connection URL for my blog's PostgreSQL database (and fixes a small difference in the URL scheme).

db-to-sqlite can then export that data and write it to a SQLite database file called simonwillisonblog.db.

The --table options specify the tables that should be included in the export.

The repository does more than just that conversion: it also exports the resulting data to JSON files that live in the repository, which gives me a commit history of changes I make to my content. This is a cheap way to get a revision history of my blog content without having to mess around with detailed history tracking inside the Django application itself.

At the end of my GitHub Actions workflow is this code that publishes the resulting database to Datasette running on Fly.io using the datasette publish fly plugin:

datasette publish fly simonwillisonblog.db \
  -m metadata.yml \
  --app simonwillisonblog-backup \
  --branch 1.0a2 \
  --extra-options "--setting sql_time_limit_ms 15000 --setting truncate_cells_html 10000 --setting allow_facet off" \
  --install datasette-block-robots \
  # ... more plugins

As you can see, there are a lot of moving parts! Surprisingly it all mostly just works - I rarely have to intervene in the process, and the cost of those different components is pleasantly low.

Tags: blogging, django, javascript, postgresql, sql, sqlite, youtube, heroku, datasette, observable, github-actions, fly, newsletter

Quoting Matthew Prince

2025-11-19 16:02:36

Cloudflare's network began experiencing significant failures to deliver core network traffic [...] triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network. [...] The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail. [...]

This resulted in the following panic which in turn resulted in a 5xx error:

thread fl2_worker_thread panicked: called Result::unwrap() on an Err value

Matthew Prince, Cloudflare outage on November 18, 2025, see also this comment

Tags: scaling, postmortem, cloudflare, rust

llm-gemini 0.27

2025-11-19 07:00:40

llm-gemini 0.27

New release of my LLM plugin for Google's Gemini models:
  • Support for nested schemas in Pydantic, thanks Bill Pugh. #107
  • Now tests against Python 3.14.
  • Support for YouTube URLs as attachments and the media_resolution option. Thanks, Duane Milne. #112
  • New model: gemini-3-pro-preview. #113

The YouTube URL feature is particularly neat, taking advantage of this API feature. I used it against the Google Antigravity launch video:

llm -m gemini-3-pro-preview \
 -a 'https://www.youtube.com/watch?v=nTOVIGsqCuY' \
 'Summary, with detailed notes about what this thing is and how it differs from regular VS Code, then a complete detailed transcript with timestamps'

Here's the result. A spot-check of the timestamps against points in the video shows them to be exactly right.

Tags: projects, youtube, ai, generative-ai, llms, llm, gemini

MacWhisper has Automatic Speaker Recognition now

2025-11-19 06:19:26

Inspired by this conversation on Hacker News I decided to upgrade MacWhisper to try out NVIDIA Parakeet and the new Automatic Speaker Recognition feature.

It appears to work really well! Here's the result against this 39.7MB m4a file from my Gemini 3 Pro write-up this morning:

A screenshot of the MacWhisper transcription application interface displaying a file named "HMB_compressed." The center panel shows a transcript of a City Council meeting. Speaker 2 begins, "Thank you, Mr. Mayor, uh City Council... Victor Hernandez, Spanish interpreter," followed by Spanish instructions: "Buenas noches, les queremos dejar saber a todos ustedes que pueden acceder lo que es el canal de Zoom..." Speaker 1 responds, "Thank you. Appreciate that. Can we please have a roll call?" Speaker 3 then calls out "Councilmember Johnson?" and "Councilmember Nagengast?" to which Speaker 1 answers, "Here." The interface includes metadata on the right indicating the model "Parakeet v3" and a total word count of 26,109.

You can export the transcript with both timestamps and speaker names using the Share -> Segments > .json menu item:

A close-up of the MacWhisper interface showing the export dropdown menu with "Segments" selected. A secondary menu lists various file formats including .txt, .csv, and .pdf, with a red arrow pointing specifically to the ".json" option, set against the background of the meeting transcript.

Here's the resulting JSON.

Tags: whisper, nvidia, ai, speech-to-text, macwhisper