2026-06-25 08:00:00
In the hours following the release of CVE-2026-8461 for the project FFmpeg, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix an out-of-bounds write in the MagicYUV decoder (libavcodec/magicyuv.c) caused by improper bounds checking, resulting in heap corruption, denial of service, and potential remote code execution when processing a specially crafted video file. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Mrs. Kitty Smitham, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."
2026-06-24 08:00:00
In the hours following the release of CVE-2026-55200 for the project libssh2, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix an out-of-bounds write in ssh2_transport_read() due to a missing upper bound check on the packet_length field, resulting in heap corruption and potential remote code execution. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Mr. Alex Doyle, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."
2026-06-23 08:00:00
What happens if I just point a git server at an object storage bucket?
Back when I was porting agent sandboxes to Go, I built everything on top of billy, a filesystem abstraction for Go. The whole trick of the project was teaching a Tigris bucket to act enough like a filesystem that a shell interpreter and its tools couldn’t tell the difference. Billy was the key layer that made the entire façade fall into place.
After I had gotten things working, I learned that I’m using billy way outside
its normal usecase. It was originally made for
go-git, a pure-Go
implementation of git’s protocols and data formats. It doesn’t rely on the
/usr/bin/git binary existing at all. Every method on billy’s filesystem
interface exists purely because go-git needs it. This gave me a terrible idea: I
already have a bucket that can quack like a filesystem and go-git’s native
language is “filesystem”.
Can this Just Work™? Let's find out.
If you strip away the porcelain, a git repository is 4 basic things:
Until I started working on this I was under the impression that git stored only the patches done to an empty folder and that was how it reconstructed the history of your repository. It does not. It actually keeps track of the entire files, which explains why big binary blobs fudge the tooling so much. The diff mental model works fine for using git day to day; it’s just wrong at the storage layer, which is the layer this post lives in.
For example, let’s say I just made a new git repository and committed a
README.md to it. The tree for the .git folder looks something like this:
$ tree .git
.git
├── COMMIT_EDITMSG
├── config
├── HEAD
├── index
├── objects
│ ├── 5e
│ │ └── b8151eb669aa4467b6dea2c4bce19183cd0b41
│ ├── 6a
│ │ └── 6a8ecfcae2632152486aca3d9150ef83dedd66
│ ├── f4
│ │ └── d2487a1c6d742c8037c0296ddf80625190bd80
│ ├── info
│ └── pack
└── refs
├── heads
│ └── main
└── tags
As you can see there are three objects. One of them is the commit
5eb8151eb669aa4467b6dea2c4bce19183cd0b41, the next is the tree, and the last
one is the README file. The main branch also points to that commit:
$ cat .git/refs/heads/main
5eb8151eb669aa4467b6dea2c4bce19183cd0b41
The cool part is that half of this is content-addressed. The content-addressed bits never change once they’ve been committed. Git objects are a great fit for Tigris’ internal model because they are append-only storage, just like the fundamental model Tigris is built upon. The things that do change often are the refs, which are updated to point to the latest commit. These are tiny files though, which means that Tigris can handle them with no effort required.
However, when we host git repositories on a server, we end up creating single points of failure. Our git repos are hosted on single machines that can and will break. The entire implementation relies on git objects being 1:1 correlated with filesystem objects because everyone (even GitHub) shells out to the git binary to actually store files. Hosting git repos becomes one of the most stateful services in our stateless cloud-native environment.
Sure git is in-theory decentralized, but most of us have ended up using that to put our git repositories in one big store that has questionable uptime practices: GitHub. To be fair to hubbers, GitHub operates at a scale that none of us can really think about. They’ve been pushing the limits since their inception where they had to get Engine Yard to keep building them bigger servers to handle the load. They have to do everything with a big mounted filesystem because git’s tooling gives them no other option.
Now suppose this weirdness bothers you enough to do something about it. To build a git server without storing everything in the local filesystem, you have to speak git somehow, and the conventional options aren’t really all that great:
die(), which kills the process.libgit, you inherit the “when things go
bad, die()” behaviour and your app now suddenly starts crashing at random.
This is not good for uptime.libgit2 (the rewrite-that’s-actually-a-library), you have
to reckon with the fact that it’s addled by the GPL (with a linking exception,
try explaining that to your lawyers), you have to eat the jump to C every time
you do anything with git (very often), development has stalled, the Go
bindings have been archived, and it still assumes a local filesystem despite
assurances it does not.It might sound hopeless, right? You may be able to use WebAssembly or something
to contain the madness (assuming you have a good way to implement
fork()/exec() or posix_spawn() or something similar), but what if there
was a pure Go library that could handle this all for us?
Enter go-git, a pure-go
implementation of the git protocol and internals from scratch. This doesn’t rely
on cgo or /usr/bin/git and it does not assume the repositories are stored in
the local filesystem. Its storage interface is written against billy, the exact
interface I’ve already taught to speak Tigris. I wanted a git server that was
just in a bucket and the pieces were sitting there and calling to me.
So I hacked up objgit, a git server
backed by object storage. The only filesystem call I had to add to get it
booting was MkdirAll. I wired up the
transport
package to a socket to implement the plaintext git protocol, hooked it up to a
bucket, and pushed the repo I was currently working on.
To my absolute astonishment, it worked.
Git pushed, pulled, logged, blamed, tagged, the whole kit and kaboodle. I didn’t have to implement git myself, I just committed an egregious amount of shoving a square peg into a round hole until the peg went in.
In hindsight this makes an annoying amount of sense. A bare repo is those four kinds of things on a filesystem; swap the filesystem for object storage and everything else Just Working™ is perfectly logical. Git’s on-disk format is its database schema and if you fake open/stat/rename convincingly enough the entire façade keeps working because APIs are the lies we tell ourselves to make us sleep at night.
After a lot of hacking, I ended up with a feature list kinda like this:
git://, and SSHEverything comes out of one Go binary with no local state, even the generated SSH keys are stored in the bucket. You can run this in a Kubernetes cluster with only the mutable storage required being temporary files for an optimistic cache when doing smart git clones.
The rest of this post is what it took to get from “oh no, it works” to something close to usable.
Obligatory disclaimer (like the best things in life): this is an experiment. It has not been tested thoroughly or vetted for correctness. If it breaks in half, you get to keep both pieces. Please do not move your company’s monorepo onto this and then email me when it catches fire.
Git is paranoid about durability, and its entire strategy is one Unix idiom that
you end up seeing many places: write new data to a temporary file and then
rename(2) it into place after you’ve assured it’s correct. POSIX guarantees
that rename is atomic, so readers either see the old file or the new one, not an
intermediate state inbetwixt the two. Packfiles (bundles of objects) land as
temporary files when uploaded then moved to their permanent home. Refs are
written as locked temporary files and then renamed over the ref. It’s rename all
the way down.
Object storage traditionally does not have rename as one atomic operation. S3’s
answer is to create exactly that intermediate state: CopyObject to the new
place and DeleteObject on the old one. This makes the most load-bearing idiom
in Git’s philosophy fall to pieces.
Luckily, Tigris has an extension for this:
RenameObject. To use
it, pass an additional X-Tigris-Rename: true header to a CopyObject call and
instead of copying then deleting on the client, it moves the metadata around on
the server. One round trip, no data movement, and the Unix idiom maps on the
bucket 1:1. Objgit’s implementation of Rename is trivial:
// internal/s3fs/basic.go
// RenameObject is a Tigris extension that renames in place (no data copy),
// so we don't need a separate CopyObject + DeleteObject.
copySource := fs3.bucket + "/" + src
_, err := fs3.client.RenameObject(ctx, &s3.CopyObjectInput{
Bucket: &fs3.bucket,
CopySource: ©Source,
Key: &dst,
})
A second, sneakier violation hides in the same codepath. When go-git writes a
temporary file, it creates that temporary file and then immediately starts
opening it for reading so it can build the pack index. You cannot do that with a
single live object in any object storage system, you are either reading or
writing, never both. I ended up working around this by cheating a bit and
buffering the contents of newly written pack files into memory so that this game
of chicken kept working. I may have to change this to write that pack cache to
the filesystem as trying to push gcc.git made me run out of RAM. At the very
least, everything lies consistently enough that git doesn’t care, so win!
stat() callsWith this correctness sorted, I tried pushing the
golang/go repository to objgit to see how long
it would take. It did work, but it took forever. Using the prometheus metrics
I mentioned before, I saw that it was making biblical amounts of HeadObject
calls. Some blocking profile analysis pointed to the fact that the git library
was using the stat() call to detect if a file exists. The flow was like:
And so on ad infinitum. This is fine-ish on a local filesystem because those syscalls resolve in microseconds, not the tens of milliseconds it takes to get from my office to the nearest Tigris region (please expand to Ottawa, I would love that so much).
This was compounded with a discovery that the transport I was using (SSH —
classic git:// shares the same code path) was exploding every packfile into
loose objects when pushing it. Each loose object write was costing two round
trips: stat() to check if a file exists and then open() / write() to
actually put the data into Tigris. This made a 100,000 object packfile cost
200,000 object storage calls. Call it 10ms of latency for each one, and that’s
over half an hour of waiting for responses that mostly say “404 not found”.
Caching can’t really save you here either, read caches would absorb the repeated reads; but this is a firehose of writes to 100,000 paths that probably have never been read and likely will never be seen again.
The reason only two transports had this problem is a deadlock story. The git
library's fast path stores an incoming pack whole through its PackfileWriter,
by copying from the connection until io.EOF. Over HTTP that's fine: the
request body ends, EOF arrives, everyone goes home. Over git:// and SSH, the
connection is a persistent socket and the client is holding it open, politely
waiting for the server's status report. EOF never comes. The copy waits forever,
the client waits forever, and you have invented a distributed deadlock with two
participants. The original workaround was to hide the PackfileWriter
capability on those transports so go-git fell back to its streaming parser that
writes every object loose. Hence the stat storm.
So the solution was to stop depending on EOF at all. Packfiles are
self-delimiting: the header says how many objects are coming and a trailing
checksum marks the end, so a packfile scanner walks the stream and stops at the
trailer while a TeeReader mirrors exactly those bytes into the
PackfileWriter. This makes the rest of the façade fall into place and the git
library is happy. This made pushes into two uploads: a packfile and its index
instead of a torrent of round trips that mean nothing.
Once I got pushing fixed, I moved on to the read path. In order to emulate
ReadAt, I used ranged GetObject requests so that the git library could read
individual objects out of packfiles. I was happy with this hack, but there was
one problem: the latency curse struck again. Cloning a simple repo with 318
objects and a 200KiB packfile made over 8,500 GetObject calls before I killed
it.
A git client cloning a repository reads repository packfiles thousands of times with random access, walking objects and candidate delta bases over and over. On a local disk you never notice because your page cache eats that for breakfast. When every call is an HTTP request, a 200KiB repo turns into dozens of megabytes of round trips. A 20MiB repo was effectively unservable.
In other words, I had un-cached the one workload that caching was designed to solve.
The fix leans on a gift from git: pack files are immutable and
content-addressed. pack-<sha>.pack will never change for as long as it
exists. This makes them trivially cacheable to a faster local medium, such as
the filesystem. No invalidation logic is required. I made objgit download packs
to a local temporary folder and serve reads from there. To be on the safe side,
I did add least-recently-used caching to the mix so that my temp folder wouldn’t
blow up unexpectedly. This does mean that the first request for pack files is
slower, but then everything else is at filesystem speed.
Yes, this relies on the local disk again, but only as a cache that can and will be thrown away. I think trading a stateless ideal for clones that terminate in reasonable amounts of time is a worthwhile bargain.
ListObjectsV2, Batman?Once the other disasters were out of the way, one more remained: the metrics
showed a flood of ListObjectsV2 calls every time a clone was made and didn’t
stop making those calls after it was done.
Two things compounded. First, when git looks up an object that isn't packed, it
probes for a loose object at objects/<xx>/<rest-of-hash>. objgit keeps packs
whole, so there are no loose objects, so every probe misses, and each miss
across a distinct two-hex prefix triggered a directory listing to find out.
There are 256 possible prefixes. A single clone could issue up to 256
ListObjectsV2 calls whose collective answer was a resounding "there is nothing
here."
Second (and more embarrassing), the listing cache already had an optimization
for this. It collapsed entire subtree lookups into recursive scans so a single
listing of the repository could answer every stat() and probe beneath it. It was
completely dead in production. The cache matched recursive prefixes against the
repo root (refs/), but every repo is chrooted to its own directory, so real
keys look like myrepo.git/refs/heads/main. The prefix check wasn’t aware of
chroots so it never actually matched anything. Nobody noticed because a cache
that degrades to “no caching” still returns the correct answer, just slowly. To
rub it in, a cache warmer was dutifully re-listing every one of those useless
prefixes every 30 seconds for 10 minutes after each clone. Thousands of
background list calls were burned in the service of caching nothing of use.
The fix was insultingly small: when a repo’s filesystem gets chrooted, register
that chroot as a recursive subtree root within the cache. This made the cache
actually useful and resulted in only one ListObjectsV2 call instead of
hundreds. Every sufficiently advanced cache is indistinguishable from a no-op
until someone graphs the miss rate.
None of these disasters were exotic. They’re the things filesystems and kernels give you for free — and every perfectly reasonable disk assumption fell to pieces once a network round trip sat at the core. Serving Git repositories is an accidental filesystem latency benchmark. If your storage abstraction has a weak point, Git will find it and the metrics will show you where that problem is.
One of the most useful parts of hosting your own git server is setting up post-receive hooks. These have been used since time immemorial for things like automatic deployments when you push code to the server. The heart of this is how we get systems like GitHub Actions: it’s code that runs when you are done pushing.
When you push to objgit with --allow-hooks enabled, it looks for a
post-receive hook in .objgit/hooks/receive-pack (this corresponds to the git
plumbing action, the name can and will be changed) in the tree of the commit you
just pushed. It will then spin up a
kefka sandbox with a
checkout of the git repository at the commit you just pushed mounted at /src
and mutable temporary files at /tmp. It gets coreutils and nothing else. No
host filesystem, no network, no arbitrary binaries. Output streams back into the
pusher as remote: lines just like when you git push heroku main. Eventually
I want to make custom commands to allow you to deploy
Tekton pipeline changes and kick off CI jobs that way,
but for now I’m happy with this working at all.
You can’t implement policy using these hooks yet. I’m working on it.
I taught a bucket to speak git. Where this goes next, roughly in order of how much the ideas keep me up at night.
CI is the obvious next step. I would wire up commands for things like “apply kubernetes object” and “create tekton pipeline run” so that CI would run via your friendly neighborhood Kubernetes cluster and then notify you through some reasonable mechanism. That’s the first thing I’ll build when I have the time.
It would be nice to have a web UI for this, which is complicated for reasons
that have nothing to do with git trees, object storage, or anything else and
everything to do with the current state of the internet. Git lookups are
expensive in the best cases and with the current torrent of unethical scraping
ransacking git servers for every scrap of RAM they have, it’s probably a bad
idea to implement this without a lot of clever optimizations. Maybe the fact
that this doesn’t have load-bearing dependencies on /usr/bin/git would make it
more resilient against scrapers. The fact that this is based on object storage
could also mean that caching would be a bit easier (having basically unlimited
storage is kind of a low-key superpower for caching), but then the main issue
would be server load. It’s a tough cookie to handle.
Performance and stability are another place this needs to improve. I’ve tested this on my developer workstation but that is far different from testing it in production. There’s some other performance issues that are easy to fix, but the big one is latency to Tigris. Maybe I can get the devops team to set me up a k3k cluster in production.
Right now this is an experiment as I plug along and feel out the shape of what git-on-object-storage can be. A git server with no disk, no git binary, and no database. If you want to take a look, check it out on GitHub.
2026-06-18 08:00:00
Anubis is about to get WebAssembly-based proof of work checks so that administrators can use a non-SHA256 proof of work method to protect their websites. Part of the implementation goals of this work is that the check logic is defined in one place on both client and server. The client and server will then hook into the WebAssembly in order to make sure they're running in lockstep.
However, one small problem comes up. What do you do when the client has WebAssembly disabled? I really don't want to de-facto lock people out of websites. Anubis exists in an impossible balance of user experience, administrator experience, and developer experience and any change to any of these factors disrupts the balance for other factors.
To work around this and also fulfill the goal of having check logic defined once, I decided to take inspiration from the legendary talk The Birth and Death of JavaScript and just recompile the WebAssembly to JavaScript. Sure, the resulting JavaScript will be slower than the equivalent WebAssembly (even more so because disabling WASM usually disables the JavaScript JIT, the thing that makes JavaScript fast), but it will finish eventually. Hopefully it will be more efficient than the existing JavaScript is on lower end hardware, but research is required.
Luckily enough, the tool I need (wasm2js from the binaryen project) is packaged in Linux distributions. The bad news is that distributions ship ancient versions of it that don't get the same output as the version on my development machine's copy from Homebrew.
In order to really make sure that the output of this is deterministic (essential for reproducible builds), I need to bundle a copy of wasm2js. So I did that by building a version of wasm2js compiled to WebAssembly with wasi-sdk. The rest of the article is the tale of reproducibility woe that lead to the implementation I ended up with. Buckle up and enjoy the ride!
There are a shocking number of ways to accidentally create nondeterministic output when doing C/C++ development. One of the easiest is to use the builtin __DATE__ and __TIME__ macros to stamp a build with the time the compiler was executed at:
// hello.cpp
#include <iostream>
int main() {
std::cout << __DATE__ << " " << __TIME__ << std::endl;
return 0;
}
Building and running it once gets me this:
$ make clean && make hello.wasm && wasmtime run -W exceptions=y ./hello.wasm
rm -f hello.o hello.wasm
wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false -c hello.cpp -o hello.o
wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false -fwasm-exceptions -lunwind --no-wasm-opt hello.o -o hello.wasm
Jun 18 2026 00:00:59
Another time it gets me this:
$ make clean && make hello.wasm && wasmtime run -W exceptions=y ./hello.wasm
rm -f hello.o hello.wasm
wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false -c hello.cpp -o hello.o
wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false -fwasm-exceptions -lunwind --no-wasm-opt hello.o -o hello.wasm
Jun 18 2026 00:01:11
Even though the source code had the same bytes, the output of the compiler was wildly different.
In order for users and packagers to trust the binaries of wasm2js I'm committing to the Anubis repo, I need to make sure that you can build the same version I built, down to the same bytes. For an added bonus, you should be able to build this on your machine and get the same bytes I got.
wasm-opt from $PATH behind your backAmong other tools like wasm2js, binaryen has a bunch of other useful tools such as wasm-opt. wasm-opt optimizes WebAssembly compiler output to let you eke out more performance. This doesn't work in every circumstance, but when it does work it makes a huge difference. As such, clang shells out to wasm-opt when doing builds.
This normally makes sense, but in this case it caused builds to fail on my DGX Spark because its version of wasm-opt is too old:
$ uname -m && which wasm-opt && wasm-opt --version
aarch64
/usr/bin/wasm-opt
wasm-opt version 108
Compared to my workstation which installs wasm-opt from Homebrew:
$ uname -m && which wasm-opt && wasm-opt --version
x86_64
/home/linuxbrew/.linuxbrew/bin/wasm-opt
wasm-opt version 130
Turns out that wasi-sdk and binaryen rely on the WebAssembly Exceptions extension. This is a reasonable thing to assume given that wasi-sdk mostly assumes you're building things for web browsers and 93.86% of browser users have a browser engine new enough to support it. C++ is also one of the main places where exceptions are used, so I guess WebAssembly-native exception handling removes a lot of boilerplate here.
Both wasmtime and wazero require you to flag into exception support. This is fine; we can just pass -W exceptions=y to wasmtime and use a custom runner harness for wazero. The annoying part is what happens when my arm machine's anemic build of wasm-opt sees exception handling instructions, causing it to exit. This made the build fail.
The solution was to pass --no-wasm-opt at the linking step. This removed one angle of irreproducibility.
The version of clang that I use to compile wasm2js has some address-sensitive code generation hiding in its exception handling path. Raw pointer values leak into the order a handful of try_table blocks come out in. This surfaces as every build differing from the next by about 29 bytes:
-002a9af0: 2802 0441 0647 0d00 1f40 0103 0820 0241 (..A.G...@... .A
-002a9b00: 206a 2103 2002 4138 6a20 0141 086a 10b5 j!. .A8j .A.j..
-002a9b10: 8881 8000 2104 0b1f 4001 0304 2003 2004 ....!...@... . .
+002a9af0: 2802 0441 0647 0d00 1f40 0103 041f 4001 (..A.G...@....@.
+002a9b00: 0309 2002 4120 6a21 0320 0241 386a 2001 .. .A j!. .A8j .
+002a9b10: 4108 6a10 b588 8180 0021 040b 2003 2004 A.j......!.. . .
To make this easier to spot, here's a partial disassembly:
i32.load offset=4 ;; 28 02 04
i32.const 6 ;; 41 06
i32.ne ;; 47
br_if 0 ;; 0d 00
- try_table (catch_all_ref 8) ;; 1f 40 01 03 08
+ try_table (catch_all_ref 4) ;; 1f 40 01 03 04
+ try_table (catch_all_ref 9) ;; 1f 40 01 03 09
local.get 2 ;; 20 02
i32.const 32 ;; 41 20
i32.add ;; 6a
local.set 3 ;; 21 03
local.get 2 ;; 20 02
i32.const 56 ;; 41 38
i32.add ;; 6a
local.get 1 ;; 20 01
i32.const 8 ;; 41 08
i32.add ;; 6a
call 17461 ;; 10 b5 88 81 80 00
local.set 4 ;; 21 04
end ;; 0b
- try_table (catch_all_ref 4) ;; 1f 40 01 03 04
local.get 3 ;; 20 03
local.get 4 ;; 20 04
The computation is nearly identical, but the byte order is just different enough to also make the catch references differ. This also fires when you build this pinned version of wasm2js on arm64 machines because its pointer iteration order is different from it is on my workstation.
To work around this, I took two steps:
setarch --addr-no-randomize.I also made a CI job ensure this:
- name: Ensure reproducibility
run: |
cd ./utils/wasm/wasm2js
./build.sh
if sha256sum -c --status shasums.x86_64; then
echo "OK: rebuilt modules match the recorded x86_64 checksums"
elif sha256sum -c --status shasums.arm64; then
echo "OK: rebuilt modules match the recorded arm64 checksums"
else
echo "::error::rebuilt wasm2js/wasm-opt match neither recorded checksum set on ${{ matrix.runner }}" >&2
sha256sum wasm-opt_130.wasm wasm2js_130.wasm
exit 1
fi
To be extra sure, we have this job run on both x86_64 and arm64 hosts. I'd really love to have this be reproducible across hosts, but that's an upstream LLVM bug that I am not powerful enough to tackle. If you work on LLVM and are reading this, it would be nice to set a seed of some kind to ensure that this iteration order is fixed across architectures.
At the very least builds are deterministic within architectures. This may have to be good enough for now.
2026-06-12 08:00:00
When you see AI model pricing pages, you usually see things broken down like this:
| Model | Context Length | Max CoT Tokens | Max Output Tokens | Input Price (Cache Hit) | Input Price (Cache Miss) | Output Price |
|---|---|---|---|---|---|---|
deepseek-chat |
64K | - | 8K | $0.07 / 1M tokens | $0.27 / 1M tokens | $1.10 / 1M tokens |
deepseek-reasoner |
64K | 32K | 8K | $0.14 / 1M tokens | $0.55 / 1M tokens | $2.19 / 1M tokens |
If you manage to have most of your input tokens be cached, you save a huge amount, in this case $0.20 per million tokens. What does this mean though? What does caching do that makes you save so much, in some cases upwards of tens of kilodollars?
Someone explain the cached vs not thing to me for how this is $10,000 worth of savings lol
[image or embed]
— Chimney Sweepers Local 420 FKA yburyug (
@bobbby.online
)
June 12, 2026 at 12:39 AM
I'm gonna be totally honest, I barely understand the basic outline of the math involved here. Where possible I am to not be completely wrong here, but I'm not going to emit something 1:1 accurate with the mathematical truth of large language models' inner workings. Bear with me.
When you make an API call to large language model services, you make an API call like the following:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
That messages element is the key bit. Every time you accumulate messages from the initial system prompt, initial user request, AI responses and any tool use requests/responses, you add to that array and make it grow bigger and bigger.
A good way to think about this is that sending a conversation to a large language model is like having a pair of people share a roll of paper on two different typewriters. Every time you finish your message, you send the roll of paper back to the AI model and it has to re-read through the entire conversation in order to start typing on the end with its response. As the conversation gets longer, this gets more and more expensive because the model has to recalculate its internal state all over again for every additional message.
However, large language model inference is complicated but deterministic. Given the same inputs, you will always get the same output. This means that you can use a technique called key-value caching (KV caching) in order to save that intermediate state and use it for next time. Most of the time this cache is a prefix cache because that allows you to just add on more messages to the end of the request pretty easily and be fine.
Imagine something like this:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "The sky is blue because of a phenomenon..."
},
{
"role": "user",
"content": "But I am looking outside right now and it is orange!"
}
]
}'
If the model has already processed the question about the sky being blue and generated the response about Rayleigh scattering, it doesn't need to process both of those messages again to answer the user's question about sunsets. In production AI model deployments you would put that generated intermediate state into the KV cache so that the model doesn't need to run twice for the same data. This saves time and effort on the side of the AI model provider, and currently model providers decide to pass that savings onto API users in the form of cheaper inference costs for cached lookups.
As you develop an application with AI in it, try to avoid changing any inference settings or previous messages between prompts. This makes your application's queries much more likely to read from the cache, making it faster, reducing the environmental impact, and saving you(r users) money.
2026-06-09 08:00:00
In the hours following the release of CVE-2026-45447 for the project OpenSSL, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a heap use-after-free in PKCS7_verify(). This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Prof. Fabian Greenholt, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."