2025-12-13 11:47:43
How to use a skill (progressive disclosure):
- After deciding to use a skill, open its
SKILL.md. Read only enough to follow the workflow.- If
SKILL.mdpoints to extra folders such asreferences/, load only the specific files needed for the request; don't bulk-load everything.- If
scripts/exist, prefer running or patching them instead of retyping large code blocks.- If
assets/or templates exist, reuse them instead of recreating from scratch.Description as trigger: The YAML
descriptioninSKILL.mdis the primary trigger signal; rely on it to decide applicability. If unsure, ask a brief clarification before proceeding.
— OpenAI Codex CLI, core/src/skills/render.rs, full prompt
Tags: skills, openai, ai, llms, codex-cli, prompt-engineering, rust, generative-ai
2025-12-13 07:29:51
One of the things that most excited me about Anthropic's new Skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just a folder with a Markdown file and some optional extra resources and scripts, so any LLM tool with the ability to navigate and read from a filesystem should be capable of using them. It turns out OpenAI are doing exactly that, with skills support quietly showing up in both their Codex CLI tool and now also in ChatGPT itself.
I learned about this from Elias Judin this morning. It turns out the Code Interpreter feature of ChatGPT now has a new /home/oai/skills folder which you can access simply by prompting:
Create a zip file of /home/oai/skills
I tried that myself and got back this zip file. Here's a UI for exploring its content (more about that tool).

So far they cover spreadsheets, docx and PDFs. Interestingly their chosen approach for PDFs and documents is to convert them to rendered per-page PNGs and then pass those through their vision-enabled GPT models, presumably to maintain information from layout and graphics that would be lost if they just ran text extraction.
Elias shared copies in a GitHub repo. They look very similar to Anthropic's implementation of the same kind of idea, currently published in their anthropics/skills repository.
I tried it out by prompting:
Create a PDF with a summary of the rimu tree situation right now and what it means for kakapo breeding season
Sure enough, GPT-5.2 Thinking started with:
Reading skill.md for PDF creation guidelines
Then:
Searching rimu mast and Kākāpō 2025 breeding status
It took just over eleven minutes to produce this PDF, which was long enough that I had Claude Code for web build me a custom PDF viewing tool while I waited.
Here's ChatGPT's PDF in that tool.

(I am very excited about Kākāpō breeding season this year.)
The reason it took so long is that it was fastidious about looking at and tweaking its own work. I appreciated that at one point it tried rendering the PDF and noticed that the macrons in kākāpō were not supported by the chosen font, so it switched to something else:

Meanwhile, two weeks ago OpenAI's open source Codex CLI tool landed a PR titled feat: experimental support for skills.md. The most recent docs for that are in docs/skills.md.
The documentation suggests that any folder in ~/.codex/skills will be treated as a skill.
I dug around and found the code that generates the prompt that drives the skill system in codex-rs/core/src/skills/render.rs - here's a Gist with a more readable version of that prompt.
I used Claude Opus 4.5's skill authoring skill to create this skill for creating Datasette plugins, then installed it into my Codex CLI skills folder like this:
git clone https://github.com/datasette/skill \
~/.codex/skills/datasette-pluginYou have to run Codex with the --enable skills option. I ran this:
cd /tmp
mkdir datasette-cowsay
cd datasette-cowsay
codex --enable skills -m gpt-5.2Then prompted:
list skills
And Codex replied:
- datasette-plugins — Writing Datasette plugins using Python + pluggy (file: /Users/simon/.codex/skills/datasette-plugin/SKILL.md)- Discovery — How to find/identify available skills (no SKILL.md path provided in the list)
Then I said:
Write a Datasette plugin in this folder adding a /-/cowsay?text=hello page that displays a pre with cowsay from PyPI saying that text
It worked perfectly! Here's the plugin code it wrote and here's a copy of the full Codex CLI transcript, generated with my terminal-to-html tool.
You can try that out yourself if you have uvx installed like this:
uvx --with https://github.com/simonw/datasette-cowsay/archive/refs/heads/main.zip \
datasetteThen visit:
http://127.0.0.1:8001/-/cowsay?text=This+is+pretty+fun

When I first wrote about skills in October I said Claude Skills are awesome, maybe a bigger deal than MCP. The fact that it's just turned December and OpenAI have already leaned into them in a big way reinforces to me that I called that one correctly.
Skills are based on a very light specification, if you could even call it that, but I still think it would be good for these to be formally documented somewhere. This could be a good initiative for the new Agentic AI Foundation (previously) to take on.
Tags: pdf, ai, kakapo, openai, prompt-engineering, generative-ai, chatgpt, llms, ai-assisted-programming, anthropic, coding-agents, gpt-5, codex-cli, skills
2025-12-13 04:20:14
I released a new version of my LLM Python library and CLI tool for interacting with Large Language Models. Highlights from the release notes:
- New OpenAI models:
gpt-5.1,gpt-5.1-chat-latest,gpt-5.2andgpt-5.2-chat-latest. #1300, #1317- When fetching URLs as fragments using
llm -f URL, the request now includes a custom user-agent header:llm/VERSION (https://llm.datasette.io/). #1309- Fixed a bug where fragments were not correctly registered with their source when using
llm chat. Thanks, Giuseppe Rota. #1316- Fixed some file descriptor leak warnings. Thanks, Eric Bloch. #1313
- Type annotations for the OpenAI Chat, AsyncChat and Completion
execute()methods. Thanks, Arjan Mossel. #1315- The project now uses
uvand dependency groups for development. See the updated contributing documentation. #1318
That last bullet point about uv relates to the dependency groups pattern I wrote about in a recent TIL. I'm currently working through applying it to my other projects - the net result is that running the test suite is as simple as doing:
git clone https://github.com/simonw/llm
cd llm
uv run pytest
The new dev dependency group defined in pyproject.toml is automatically installed by uv run in a new virtual environment which means everything needed to run pytest is available without needing to add any extra commands.
Tags: projects, python, ai, annotated-release-notes, generative-ai, llms, llm, uv
2025-12-12 07:58:04
OpenAI reportedly declared a "code red" on the 1st of December in response to increasingly credible competition from the likes of Google's Gemini 3. It's less than two weeks later and they just announced GPT-5.2, calling it "the most capable model series yet for professional knowledge work".
The new model comes in two variants: GPT-5.2 and GPT-5.2 Pro. There's no Mini variant yet.
GPT-5.2 is available via their UI in both "instant" and "thinking" modes, presumably still corresponding to the API concept of different reasoning effort levels.
The knowledge cut-off date for both variants is now August 31st 2025. This is significant - GPT 5.1 and 5 were both Sep 30, 2024 and GPT-5 mini was May 31, 2024.
Both of the 5.2 models have a 400,000 token context window and 128,000 max output tokens - no different from 5.1 or 5.
Pricing wise 5.2 is a rare increase - it's 1.4x the cost of GPT 5.1, at $1.75/million input and $14/million output. GPT-5.2 Pro is $21.00/million input and a hefty $168.00/million output, putting it up there with their previous most expensive models o1 Pro and GPT-4.5.
So far the main benchmark results we have are self-reported by OpenAI. The most interesting ones are a 70.9% score on their GDPval "Knowledge work tasks" benchmark (GPT-5 got 38.8%) and a 52.9% on ARC-AGI-2 (up from 17.6% for GPT-5.1 Thinking).
The ARC Prize Twitter account provided this interesting note on the efficiency gains for GPT-5.2 Pro
A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task
Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task
This represents a ~390X efficiency improvement in one year
GPT-5.2 can be accessed in OpenAI's Codex CLI tool like this:
codex -m gpt-5.2
There are three new API models:
OpenAI have published a new GPT-5.2 Prompting Guide. An interesting note from that document is that compaction can now be run with a new dedicated server-side API:
For long-running, tool-heavy workflows that exceed the standard context window, GPT-5.2 with Reasoning supports response compaction via the
/responses/compactendpoint. Compaction performs a loss-aware compression pass over prior conversation state, returning encrypted, opaque items that preserve task-relevant information while dramatically reducing token footprint. This allows the model to continue reasoning across extended workflows without hitting context limits.
One note from the announcement that caught my eye:
GPT‑5.2 Thinking is our strongest vision model yet, cutting error rates roughly in half on chart reasoning and software interface understanding.
I had disappointing results from GPT-5 on an OCR task a while ago. I tried it against GPT-5.2 and it did much better:
llm -m gpt-5.2 ocr -a https://static.simonwillison.net/static/2025/ft.jpegHere's the result from that, which cost 1,520 input and 1,022 for a total of 1.6968 cents.
For my classic "Generate an SVG of a pelican riding a bicycle" test:
llm -m gpt-5.2 "Generate an SVG of a pelican riding a bicycle"
And for the more advanced alternative test, which tests instruction following in a little more depth:
llm -m gpt-5.2 "Generate an SVG of a California brown pelican riding a bicycle. The bicycle
must have spokes and a correctly shaped bicycle frame. The pelican must have its
characteristic large pouch, and there should be a clear indication of feathers.
The pelican must be clearly pedaling the bicycle. The image should show the full
breeding plumage of the California brown pelican."
Tags: ai, openai, generative-ai, llms, llm, pelican-riding-a-bicycle, llm-release, gpt-5
2025-12-11 05:00:59
I've started using the term HTML tools to refer to HTML applications that I've been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I've discovered along the way.
First, some examples to show the kind of thing I'm talking about:
These are some of my recent favorites. I have dozens more like this that I use on a regular basis.
You can explore my collection on tools.simonwillison.net - the by month view is useful for browsing the entire collection.
If you want to see the code and prompts, almost all of the examples in this post include a link in their footer to "view source" on GitHub. The GitHub commits usually contain either the prompt itself or a link to the transcript used to create the tool.
These are the characteristics I have found to be most productive in building tools of this nature:
The end result is a few hundred lines of code that can be cleanly copied and pasted into a GitHub repository.
The easiest way to build one of these tools is to start in ChatGPT or Claude or Gemini. All three have features where they can write a simple HTML+JavaScript application and show it to you directly.
Claude calls this "Artifacts", ChatGPT and Gemini both call it "Canvas". Claude has the feature enabled by default, ChatGPT and Gemini may require you to toggle it on in their "tools" menus.
Try this prompt in Gemini or ChatGPT:
Build a canvas that lets me paste in JSON and converts it to YAML. No React.
Or this prompt in Claude:
Build an artifact that lets me paste in JSON and converts it to YAML. No React.
I always add "No React" to these prompts, because otherwise they tend to build with React, resulting in a file that is harder to copy and paste out of the LLM and use elsewhere. I find that attempts which use React take longer to display (since they need to run a build step) and are more likely to contain crashing bugs for some reason, especially in ChatGPT.
All three tools have "share" links that provide a URL to the finished application. Examples:
Coding agents such as Claude Code and Codex CLI have the advantage that they can test the code themselves while they work on it using tools like Playwright. I often upgrade to one of those when I'm working on something more complicated, like my Bluesky thread viewer tool shown above.
I also frequently use asynchronous coding agents like Claude Code for web to make changes to existing tools. I shared a video about that in Building a tool to copy-paste share terminal sessions using Claude Code for web.
Claude Code for web and Codex Cloud run directly against my simonw/tools repo, which means they can publish or upgrade tools via Pull Requests (here are dozens of examples) without me needing to copy and paste anything myself.
Any time I use an additional JavaScript library as part of my tool I like to load it from a CDN.
The three major LLM platforms support specific CDNs as part of their Artifacts or Canvas features, so often if you tell them "Use PDF.js" or similar they'll be able to compose a URL to a CDN that's on their allow-list.
Sometimes you'll need to go and look up the URL on cdnjs or jsDelivr and paste it into the chat.
CDNs like these have been around for long enough that I've grown to trust them, especially for URLs that include the package version.
The alternative to CDNs is to use npm and have a build step for your projects. I find this reduces my productivity at hacking on individual tools and makes it harder to self-host them.
I don't like leaving my HTML tools hosted by the LLM platforms themselves for a couple of reasons. First, LLM platforms tend to run the tools inside a tight sandbox with a lot of restrictions. They're often unable to load data or images from external URLs, and sometimes even features like linking out to other sites are disabled.
The end-user experience often isn't great either. They show warning messages to new users, often take additional time to load and delight in showing promotions for the platform that was used to create the tool.
They're also not as reliable as other forms of static hosting. If ChatGPT or Claude are having an outage I'd like to still be able to access the tools I've created in the past.
Being able to easily self-host is the main reason I like insisting on "no React" and using CDNs for dependencies - the absence of a build step makes hosting tools elsewhere a simple case of copying and pasting them out to some other provider.
My preferred provider here is GitHub Pages because I can paste a block of HTML into a file on github.com and have it hosted on a permanent URL a few seconds later. Most of my tools end up in my simonw/tools repository which is configured to serve static files at tools.simonwillison.net.
One of the most useful input/output mechanisms for HTML tools comes in the form of copy and paste.
I frequently build tools that accept pasted content, transform it in some way and let the user copy it back to their clipboard to paste somewhere else.
Copy and paste on mobile phones is fiddly, so I frequently include "Copy to clipboard" buttons that populate the clipboard with a single touch.
Most operating system clipboards can carry multiple formats of the same copied data. That's why you can paste content from a word processor in a way that preserves formatting, but if you paste the same thing into a text editor you'll get the content with formatting stripped.
These rich copy operations are available in JavaScript paste events as well, which opens up all sorts of opportunities for HTML tools.
The key to building interesting HTML tools is understanding what's possible. Building custom debugging tools is a great way to explore these options.
clipboard-viewer is one of my most useful. You can paste anything into it (text, rich text, images, files) and it will loop through and show you every type of paste data that's available on the clipboard.

This was key to building many of my other tools, because it showed me the invisible data that I could use to bootstrap other interesting pieces of functionality.
More debugging examples:
KeyCode values) currently being held down.HTML tools may not have access to server-side databases for storage but it turns out you can store a lot of state directly in the URL.
I like this for tools I may want to bookmark or share with other people.
The localStorage browser API lets HTML tools store data persistently on the user's device, without exposing that data to the server.
I use this for larger pieces of state that don't fit comfortably in a URL, or for secrets like API keys which I really don't want anywhere near my server - even static hosts might have server logs that are outside of my influence.
prompt() function) and then store that in localStorage. This one uses Claude Haiku to write haikus about what it can see through the user's webcam.CORS stands for Cross-origin resource sharing. It's a relatively low-level detail which controls if JavaScript running on one site is able to fetch data from APIs hosted on other domains.
APIs that provide open CORS headers are a goldmine for HTML tools. It's worth building a collection of these over time.
Here are some I like:
GitHub Gists are a personal favorite here, because they let you build apps that can persist state to a permanent Gist through making a cross-origin API call.
.whl file for a Python package from PyPI, unzips it (in browser memory) and lets you navigate the files.All three of OpenAI, Anthropic and Gemini offer JSON APIs that can be accessed via CORS directly from HTML tools.
Unfortunately you still need an API key, and if you bake that key into your visible HTML anyone can steal it and use to rack up charges on your account.
I use the localStorage secrets pattern to store API keys for these services. This sucks from a user experience perspective - telling users to go and create an API key and paste it into a tool is a lot of friction - but it does work.
Some examples:
You don't need to upload a file to a server in order to make use of the <input type="file"> element. JavaScript can access the content of that file directly, which opens up a wealth of opportunities for useful functionality.
Some examples:
PDF.js and Tesseract.js to allow users to open a PDF in their browser which it then converts to an image-per-page and runs through OCR.ffmpeg command needed to produce a cropped copy on your own machine.An HTML tool can generate a file for download without needing help from a server.
The JavaScript library ecosystem has a huge range of packages for generating files in all kinds of useful formats.
Pyodide is a distribution of Python that's compiled to WebAssembly and designed to run directly in browsers. It's an engineering marvel and one of the most underrated corners of the Python world.
It also cleanly loads from a CDN, which means there's no reason not to use it in HTML tools!
Even better, the Pyodide project includes micropip - a mechanism that can load extra pure-Python packages from PyPI via CORS.
Pyodide is possible thanks to WebAssembly. WebAssembly means that a vast collection of software originally written in other languages can now be loaded in HTML tools as well.
Squoosh.app was the first example I saw that convinced me of the power of this pattern - it makes several best-in-class image compression libraries available directly in the browser.
I've used WebAssembly for a few of my own tools:
The biggest advantage of having a single public collection of 100+ tools is that it's easy for my LLM assistants to recombine them in interesting ways.
Sometimes I'll copy and paste a previous tool into the context, but when I'm working with a coding agent I can reference them by name - or tell the agent to search for relevant examples before it starts work.
The source code of any working tool doubles as clear documentation of how something can be done, including patterns for using editing libraries. An LLM with one or two existing tools in their context is much more likely to produce working code.
I built pypi-changelog by telling Claude Code:
Look at the pypi package explorer tool
And then, after it had found and read the source code for zip-wheel-explorer:
Build a new tool pypi-changelog.html which uses the PyPI API to get the wheel URLs of all available versions of a package, then it displays them in a list where each pair has a "Show changes" clickable in between them - clicking on that fetches the full contents of the wheels and displays a nicely rendered diff representing the difference between the two, as close to a standard diff format as you can get with JS libraries from CDNs, and when that is displayed there is a "Copy" button which copies that diff to the clipboard
Here's the full transcript.
See Running OCR against PDFs and images directly in your browser for another detailed example of remixing tools to create something new.
I like keeping (and publishing) records of everything I do with LLMs, to help me grow my skills at using them over time.
For HTML tools I built by chatting with an LLM platform directly I use the "share" feature for those platforms.
For Claude Code or Codex CLI or other coding agents I copy and paste the full transcript from the terminal into my terminal-to-html tool and share that using a Gist.
In either case I include links to those transcripts in the commit message when I save the finished tool to my repository. You can see those in my tools.simonwillison.net colophon.
I've had so much fun exploring the capabilities of LLMs in this way over the past year and a half, and building tools in this way has been invaluable in helping me understand both the potential for building tools with HTML and the capabilities of the LLMs that I'm building them with.
If you're interested in starting your own collection I highly recommend it! All you need to get started is a free GitHub repository with GitHub Pages enabled (Settings -> Pages -> Source -> Deploy from a branch -> main) and you can start copying in .html pages generated in whatever manner you like.
Bonus transcript: Here's how I used Claude Code and shot-scraper to add the screenshots to this post.
Tags: definitions, github, html, javascript, projects, tools, ai, webassembly, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents, claude-code
2025-12-11 04:18:58
The Normalization of Deviance in AI
This thought-provoking essay from Johann Rehberger directly addresses something that I’ve been worrying about for quite a while: in the absence of any headline-grabbing examples of prompt injection vulnerabilities causing real economic harm, is anyone going to care?Johann describes the concept of the “Normalization of Deviance” as directly applying to this question.
Coined by Diane Vaughan, the key idea here is that organizations that get away with “deviance” - ignoring safety protocols or otherwise relaxing their standards - will start baking that unsafe attitude into their culture. This can work fine… until it doesn’t. The Space Shuttle Challenger disaster has been partially blamed on this class of organizational failure.
As Johann puts it:
In the world of AI, we observe companies treating probabilistic, non-deterministic, and sometimes adversarial model outputs as if they were reliable, predictable, and safe.
Vendors are normalizing trusting LLM output, but current understanding violates the assumption of reliability.
The model will not consistently follow instructions, stay aligned, or maintain context integrity. This is especially true if there is an attacker in the loop (e.g indirect prompt injection).
However, we see more and more systems allowing untrusted output to take consequential actions. Most of the time it goes well, and over time vendors and organizations lower their guard or skip human oversight entirely, because “it worked last time.”
This dangerous bias is the fuel for normalization: organizations confuse the absence of a successful attack with the presence of robust security.
Tags: security, ai, prompt-injection, generative-ai, llms, johann-rehberger, ai-ethics