MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

Context plumbing

2025-11-29 19:26:24

Context plumbing

Matt Webb coins the term context plumbing to describe the kind of engineering needed to feed agents the right context at the right time:

Context appears at disparate sources, by user activity or changes in the user’s environment: what they’re working on changes, emails appear, documents are edited, it’s no longer sunny outside, the available tools have been updated.

This context is not always where the AI runs (and the AI runs as closer as possible to the point of user intent).

So the job of making an agent run really well is to move the context to where it needs to be. [...]

So I’ve been thinking of AI system technical architecture as plumbing the sources and sinks of context.

Tags: definitions, matt-webb, ai, generative-ai, llms, ai-agents, context-engineering

Quoting Wikipedia content guideline

2025-11-29 18:55:30

Large language models (LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.

Wikipedia content guideline, promoted to a guideline on 24th November 2025

Tags: ai-ethics, slop, generative-ai, wikipedia, ai, llms

A ChatGPT prompt equals about 5.1 seconds of Netflix

2025-11-29 10:13:36

In June 2025 Sam Altman claimed about ChatGPT that "the average query uses about 0.34 watt-hours".

In March 2020 George Kamiya of the International Energy Agency estimated that "streaming a Netflix video in 2019 typically consumed 0.12-0.24kWh of electricity per hour" - that's 240 watt-hours per Netflix hour at the higher end.

Assuming that higher end, a ChatGPT prompt by Sam Altman's estimate uses:

0.34 Wh / (240 Wh / 3600 seconds) = 5.1 seconds of Netflix

Or double that, 10.2 seconds, if you take the lower end of the Netflix estimate instead.

I'm always interested in anything that can help contextualize a number like "0.34 watt-hours" - I think this comparison to Netflix is a neat way of doing that.

This is evidently not the whole story with regards to AI energy usage - training costs, data center buildout costs and the ongoing fierce competition between the providers all add up to a very significant carbon footprint for the AI industry as a whole.

(I got some help from ChatGPT to dig these numbers out, but I then confirmed the source, ran the calculations myself, and had Claude Opus 4.5 run an additional fact check.)

Tags: netflix, ai-energy-usage, openai, ai, llms, ai-ethics, sam-altman, generative-ai, chatgpt

Bluesky Thread Viewer thread by @simonwillison.net

2025-11-29 07:57:22

Bluesky Thread Viewer thread by @simonwillison.net

I've been having a lot of fun hacking on my Bluesky Thread Viewer JavaScript tool with Claude Code recently. Here it renders a thread (complete with demo video) talking about the latest improvements to the tool itself.

This short animated GIF demo starts with the Thread by @simonwillison.net page where a URL to a Bluesky post has been entered and a Fetch Thread button clicked. The thread is shown as a nested collection of replies. A "Hide other replies" button hides the replies revealing just the top-level self-replies by the original author - and turns into a "Show 11 other replies" button when toggled. There are tabs for Thread View and Most Recent First - the latter when clicked shows a linear list of posts with the most recent at the top. There are "Copy" and Copy JSON" green buttons at the top of the page.

I've been mostly vibe-coding this thing since April, now spanning 15 commits with contributions from ChatGPT, Claude, Claude Code for Web and Claude Code on my laptop. Each of those commits links to the transcript that created the changes in the commit.

Bluesky is a lot of fun to build tools like this against because the API supports CORS (so you can talk to it from an HTML+JavaScript page hosted anywhere) and doesn't require authentication.

Tags: projects, tools, ai, generative-ai, llms, cors, bluesky, vibe-coding, coding-agents, claude-code

Quoting Qwen3-VL Technical Report

2025-11-28 01:01:11

To evaluate the model’s capability in processing long-context inputs, we construct a video “Needle-in- a-Haystack” evaluation on Qwen3-VL-235B-A22B-Instruct. In this task, a semantically salient “needle” frame—containing critical visual evidence—is inserted at varying temporal positions within a long video. The model is then tasked with accurately locating the target frame from the long video and answering the corresponding question. [...]

As shown in Figure 3, the model achieves a perfect 100% accuracy on videos up to 30 minutes in duration—corresponding to a context length of 256K tokens. Remarkably, even when extrapolating to sequences of up to 1M tokens (approximately 2 hours of video) via YaRN-based positional extension, the model retains a high accuracy of 99.5%.

Qwen3-VL Technical Report, 5.12.3: Needle-in-a-Haystack

Tags: vision-llms, evals, generative-ai, ai-in-china, ai, qwen, llms

deepseek-ai/DeepSeek-Math-V2

2025-11-27 23:59:23

deepseek-ai/DeepSeek-Math-V2

New on Hugging Face, a specialist mathematical reasoning LLM from DeepSeek. This is their entry in the space previously dominated by proprietary models from OpenAI and Google DeepMind, both of which achieved gold medal scores on the International Mathematical Olympiad earlier this year.

We now have an open weights (Apache 2 licensed) 685B, 689GB model that can achieve the same. From the accompanying paper:

DeepSeekMath-V2 demonstrates strong performance on competition mathematics. With scaled test-time compute, it achieved gold-medal scores in high-school competitions including IMO 2025 and CMO 2024, and a near-perfect score on the undergraduate Putnam 2024 competition.

Tags: mathematics, ai, generative-ai, llms, llm-reasoning, deepseek, llm-release, ai-in-china