2026-02-05 08:23:38
Spotlighting The World Factbook as We Bid a Fond Farewell
Somewhat devastating news today from CIA:One of CIA’s oldest and most recognizable intelligence publications, The World Factbook, has sunset.
There's not even a hint as to why they decided to stop maintaining this publication, which has been their most useful public-facing initiative since 1971 and a cornerstone of the public internet since 1997.
In a bizarre act of cultural vandalism they've not just removed the entire site (including the archives of previous versions) but they've also set every single page to be a 302 redirect to their closure announcement.
The Factbook has been released into the public domain since the start. There's no reason not to continue to serve archived versions - a banner at the top of the page saying it's no longer maintained would be much better than removing all of that valuable content entirely.
Up until 2020 the CIA published annual zip file archives of the entire site. Those are available (along with the rest of the Factbook) on the Internet Archive.
I downloaded the 384MB .zip file for the year 2020 and extracted it into a new GitHub repository, simonw/cia-world-factbook-2020. I've enabled GitHub Pages for that repository so you can browse the archived copy at simonw.github.io/cia-world-factbook-2020/.

Here's a neat example of the editorial voice of the Factbook from the What's New page, dated December 10th 2020:
Years of wrangling were brought to a close this week when officials from Nepal and China announced that they have agreed on the height of Mount Everest. The mountain sits on the border between Nepal and Tibet (in western China), and its height changed slightly following an earthquake in 2015. The new height of 8,848.86 meters is just under a meter higher than the old figure of 8,848 meters. The World Factbook rounds the new measurement to 8,849 meters and this new height has been entered throughout the Factbook database.
Via Hacker News
Tags: cia, github, internet-archive
2026-02-05 06:42:34
Voxtral transcribes at the speed of sound
Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, and a sequel to the original Voxtral which they released in July 2025.Voxtral Realtime - official name Voxtral-Mini-4B-Realtime-2602 - is the open weights (Apache-2.0) model, available as a 8.87GB download from Hugging Face.
You can try it out in this live demo - don't be put off by the "No microphone found" message, clicking "Record" should have your browser request permission and then start the demo working. I was very impressed by the demo - I talked quickly and used jargon like Django and WebAssembly and it correctly transcribed my text within moments of me uttering each sound.
The closed weight model is called voxtral-mini-latest and can be accessed via the Mistral API, using calls that look something like this:
curl -X POST "https://api.mistral.ai/v1/audio/transcriptions" \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-F model="voxtral-mini-latest" \
-F file=@"Pelican talk at the library.m4a" \
-F diarize=true \
-F context_bias="Datasette" \
-F timestamp_granularities="segment"It's priced at $0.003/minute, which is $0.18/hour.
The Mistral API console now has a speech-to-text playground for exercising the new model and it is excellent. You can upload an audio file and promptly get a diarized transcript in a pleasant interface, with options to download the result in text, SRT or JSON format.

Via Hacker News
Tags: ai, generative-ai, llms, hugging-face, mistral, speech-to-text
2026-02-04 22:59:47
I've been exploring Go for building small, fast and self-contained binary applications recently. I'm enjoying how there's generally one obvious way to do things and the resulting code is boring and readable - and something that LLMs are very competent at writing. The one catch is distribution, but it turns out publishing Go binaries to PyPI means any Go binary can be just a uvx package-name call away.
sqlite-scanner is my new Go CLI tool for scanning a filesystem for SQLite database files.
It works by checking if the first 16 bytes of the file exactly match the SQLite magic number sequence SQLite format 3\x00. It can search one or more folders recursively, spinning up concurrent goroutines to accelerate the scan. It streams out results as it finds them in plain text, JSON or newline-delimited JSON. It can optionally display the file sizes as well.
To try it out you can download a release from the GitHub releases - and then jump through macOS hoops to execute an "unsafe" binary. Or you can clone the repo and compile it with Go. Or... you can run the binary like this:
uvx sqlite-scanner
By default this will search your current directory for SQLite databases. You can pass one or more directories as arguments:
uvx sqlite-scanner ~ /tmp
Add --json for JSON output, --size to include file sizes or --jsonl for newline-delimited JSON. Here's a demo:
uvx sqlite-scanner ~ --jsonl --size

If you haven't been uv-pilled yet you can instead install sqlite-scanner using pip install sqlite-scanner and then run sqlite-scanner.
To get a permanent copy with uv use uv tool install sqlite-scanner.
The reason this is worth doing is that pip, uv and PyPI will work together to identify the correct compiled binary for your operating system and architecture.
This is driven by file names. If you visit the PyPI downloads for sqlite-scanner you'll see the following files:
sqlite_scanner-0.1.1-py3-none-win_arm64.whlsqlite_scanner-0.1.1-py3-none-win_amd64.whlsqlite_scanner-0.1.1-py3-none-musllinux_1_2_x86_64.whlsqlite_scanner-0.1.1-py3-none-musllinux_1_2_aarch64.whlsqlite_scanner-0.1.1-py3-none-manylinux_2_17_x86_64.whlsqlite_scanner-0.1.1-py3-none-manylinux_2_17_aarch64.whlsqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whlsqlite_scanner-0.1.1-py3-none-macosx_10_9_x86_64.whlWhen I run pip install sqlite-scanner or uvx sqlite-scanner on my Apple Silicon Mac laptop Python's packaging magic ensures I get that macosx_11_0_arm64.whl variant.
Here's what's in the wheel, which is a zip file with a .whl extension.
In addition to the bin/sqlite-scanner the most important file is sqlite_scanner/__init__.py which includes the following:
def get_binary_path(): """Return the path to the bundled binary.""" binary = os.path.join(os.path.dirname(__file__), "bin", "sqlite-scanner") # Ensure binary is executable on Unix if sys.platform != "win32": current_mode = os.stat(binary).st_mode if not (current_mode & stat.S_IXUSR): os.chmod(binary, current_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH) return binary def main(): """Execute the bundled binary.""" binary = get_binary_path() if sys.platform == "win32": # On Windows, use subprocess to properly handle signals sys.exit(subprocess.call([binary] + sys.argv[1:])) else: # On Unix, exec replaces the process os.execvp(binary, [binary] + sys.argv[1:])
That main() method - also called from sqlite_scanner/__main__.py - locates the binary and executes it when the Python package itself is executed, using the sqlite-scanner = sqlite_scanner:main entry point defined in the wheel.
Using PyPI as a distribution platform for Go binaries feels a tiny bit abusive, albeit there is plenty of precedent.
I’ll justify it by pointing out that this means we can use Go binaries as dependencies for other Python packages now.
That's genuinely useful! It means that any functionality which is available in a cross-platform Go binary can now be subsumed into a Python package. Python is really good at running subprocesses so this opens up a whole world of useful tricks that we can bake into our Python tools.
To demonstrate this, I built datasette-scan - a new Datasette plugin which depends on sqlite-scanner and then uses that Go binary to scan a folder for SQLite databases and attach them to a Datasette instance.
Here's how to use that (without even installing anything first, thanks uv) to explore any SQLite databases in your Downloads folder:
uv run --with datasette-scan datasette scan ~/DownloadsIf you peek at the code you'll see it depends on sqlite-scanner in pyproject.toml and calls it using subprocess.run() against sqlite_scanner.get_binary_path() in its own scan_directories() function.
I've been exploring this pattern for other, non-Go binaries recently - here's a recent script that depends on static-ffmpeg to ensure that ffmpeg is available for the script to use.
After trying this pattern myself a couple of times I realized it would be useful to have a tool to automate the process.
I first brainstormed with Claude to check that there was no existing tool to do this. It pointed me to maturin bin which helps distribute Rust projects using Python wheels, and pip-binary-factory which bundles all sorts of other projects, but did not identify anything that addressed the exact problem I was looking to solve.
So I had Claude Code for web build the first version, then refined the code locally on my laptop with the help of more Claude Code and a little bit of OpenAI Codex too, just to mix things up.
The full documentation is in the simonw/go-to-wheel repository. I've published that tool to PyPI so now you can run it using:
uvx go-to-wheel --helpThe sqlite-scanner package you can see on PyPI was built using go-to-wheel like this:
uvx go-to-wheel ~/dev/sqlite-scanner \
--set-version-var main.version \
--version 0.1.1 \
--readme README.md \
--author 'Simon Willison' \
--url https://github.com/simonw/sqlite-scanner \
--description 'Scan directories for SQLite databases'This created a set of wheels in the dist/ folder. I tested one of them like this:
uv run --with dist/sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl \
sqlite-scanner --versionWhen that spat out the correct version number I was confident everything had worked as planned, so I pushed the whole set of wheels to PyPI using twine upload like this:
uvx twine upload dist/*I had to paste in a PyPI API token I had saved previously and that was all it took.
sqlite-scanner is very clearly meant as a proof-of-concept for this wider pattern - Python is very much capable of recursively crawling a directory structure looking for files that start with a specific byte prefix on its own!
That said, I think there's a lot to be said for this pattern. Go is a great complement to Python - it's fast, compiles to small self-contained binaries, has excellent concurrency support and a rich ecosystem of libraries.
Go is similar to Python in that it has a strong standard library. Go is particularly good for HTTP tooling - I've built several HTTP proxies in the past using Go's excellent net/http/httputil.ReverseProxy handler.
I've also been experimenting with wazero, Go's robust and mature zero dependency WebAssembly runtime as part of my ongoing quest for the ideal sandbox for running untrusted code. Here's my latest experiment with that library.
Being able to seamlessly integrate Go binaries into Python projects without the end user having to think about Go at all - they pip install and everything Just Works - feels like a valuable addition to my toolbox.
Tags: go, packaging, projects, pypi, python, sqlite, datasette, ai-assisted-programming, uv
2026-02-04 06:44:50
Here's a new hosted sandbox product from the Deno team. It's actually unrelated to Deno itself - this is part of their Deno Deploy SaaS platform. As such, you don't even need to use JavaScript to access it - you can create and execute code in a hosted sandbox using their deno-sandbox Python library like this:
export DENO_DEPLOY_TOKEN="... API token ..."
uv run --with deno-sandbox pythonThen:
from deno_sandbox import DenoDeploy sdk = DenoDeploy() with sdk.sandbox.create() as sb: # Run a shell command process = sb.spawn( "echo", args=["Hello from the sandbox!"] ) process.wait() # Write and read files sb.fs.write_text_file( "/tmp/example.txt", "Hello, World!" ) print(sb.fs.read_text_file( "/tmp/example.txt" ))
There’s a JavaScript client library as well. The underlying API isn’t documented yet but appears to use WebSockets.
There’s a lot to like about this system. Sandboxe instances can have up to 4GB of RAM, get 2 vCPUs, 10GB of ephemeral storage, can mount persistent volumes and can use snapshots to boot pre-configured custom images quickly. Sessions can last up to 30 minutes and are billed by CPU time, GB-h of memory and volume storage usage.
When you create a sandbox you can configure network domains it’s allowed to access.
My favorite feature is the way it handles API secrets.
with sdk.sandboxes.create( allowNet=["api.openai.com"], secrets={ "OPENAI_API_KEY": { "hosts": ["api.openai.com"], "value": os.environ.get("OPENAI_API_KEY"), } }, ) as sandbox: # ... $OPENAI_API_KEY is available
Within the container that $OPENAI_API_KEY value is set to something like this:
DENO_SECRET_PLACEHOLDER_b14043a2f578cba...
Outbound API calls to api.openai.com run through a proxy which is aware of those placeholders and replaces them with the original secret.
In this way the secret itself is not available to code within the sandbox, which limits the ability for malicious code (e.g. from a prompt injection) to exfiltrate those secrets.
From a comment on Hacker News I learned that Fly have a project called tokenizer that implements the same pattern. Adding this to my list of tricks to use with sandoxed environments!
Via Hacker News
Tags: python, sandboxing, security, deno, fly
2026-02-03 14:36:10
I just sent the January edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In the newsletter for January:
Here's a copy of the December newsletter as a preview of what you'll get. Pay $10/month to stay a month ahead of the free copy!
Tags: newsletter
2026-02-03 10:31:10
This is the difference between Data and a large language model, at least the ones operating right now. Data created art because he wanted to grow. He wanted to become something. He wanted to understand. Art is the means by which we become what we want to be. [...]
The book, the painting, the film script is not the only art. It's important, but in a way it's a receipt. It's a diploma. The book you write, the painting you create, the music you compose is important and artistic, but it's also a mark of proof that you have done the work to learn, because in the end of it all, you are the art. The most important change made by an artistic endeavor is the change it makes in you. The most important emotions are the ones you feel when writing that story and holding the completed work. I don't care if the AI can create something that is better than what we can create, because it cannot be changed by that creation.
— Brandon Sanderson, via Guido van Rossum
Tags: ai-ethics, generative-ai, art, ai, llms, guido-van-rossum