MoreRSS

site iconTim BrayModify

ongoing is short for “ongoing fragmented essay. The unifying themes are Truth, Technology, and Business.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Tim Bray

Long Links

2025-08-05 03:00:00

All of these Long Links pieces have begun with more or less the same words, so why stop now? This is an annotated parade of links to long-form pieces. Most people won’t have the time (nor the weird assortment of interests) to consume them all, but I hope that most readers will find one or two reward a visit.

Radisson (and Groseilliers)

I don’t know if it is still the case, but in my youth, Canadian elementary education included several overexcited units about the Coureurs des bois, early European settlers in “New France” (now Québec) who ventured, by foot and canoe, far to the north and west, mostly engaged in trading with the indigenous peoples: trinkets (and later, serious hardware including guns) for furs.

The names I remembered were Radisson and Groseilliers, but I don’t recall learning much about who they were and what they did. Then I ran across the 2019 book Bush Runner: The Adventures of Pierre-Esprit Radisson and, wow… The writing is pedestrian but who cares because what a story! Radisson lived an absolutely astonishing life. He went as deep into the bush as anyone of his era, interacted intensely with the indigenous people as business partner, friend, and foe, worked for Charles of England and Louis of France (changing sides several times), in 1670 founded the Hudson’s Bay Company (recently, 355 years later, deceased), and fortunately took notes, a copy of which was preserved by Samuel Pepys.

I learned more from this book’s pages about the early history of Upper and Lower Canada than all those elementary-school units had to offer, and had loads of fun doing so. I guess this is a fairly Canadian-specific Long Link, but I think anyone interested in the early history of Europeans in North America would find much to enjoy.

Music

It’s rare these days that I discover interesting new musicians, but here are two of those rarities.

Lucie Horsch plays recorder, you know, the cheap plastic thing they use to introduce second-graders to music. It’s actually a lovely instrument and I wish we would switch to its German name, “Blockflöte”, which to my ear sounds a bit like the instrument does. Anyhow, check out this YouTube entitled only Lucie Horsch - Bach, annoyingly omitting any mention of which Bach. Annoyance aside, it’s a pretty great performance, Ms Horsch is the real deal, full of virtuosity and grace.

I got an unusual mid-week message from Qobuz, all excited about The New Eves’ new record The New Eve Is Rising. So I played it in the car on a long crosstown drive and now I’m all excited too. The New Eves are talented, musically surprising, and above all, insanely brave.

Their music doesn’t sound like anything else and flies in the face of all conventional wisdom concerning popular music. They take absurd chances and yeah, the album has klunkers amid the bangers, but when I got to its end I went back and started at the beginning again. I found myself smiling ear-to-ear over and over. Maybe I’m being a bit over-the-top here, but check them out: Mother is live. Cow Song is off the new album and strong albeit with forgettable video.

Life online

Every Long Links has hardcore-geek threads and there is no harder core imaginable than Filippo Valsordi’s Go Assembly Mutation Testing. I have always admired (but never actually used) mutation testing, and Filippo offers a convincing argument that it moves catching certain classes of bug from nearly impossible to pretty easy. Good stuff!

And of course we can’t ignore genAI and programming. Most of you are likely aware of Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, but I’m linking again to boost its visibility, because hard quantitative research on methodology is damn rare in our profession. I will confess to being a little (but just a little) surprised at the conclusions.

It is apparently quite possible that Intel will exit the business of making high-end chips, leaving TSMC with a global monopoly: Intel and the Wide Open Barn Doors. This is an unsettling prospect. Not, I have to say, surprising though. I’ve sneered at Intel leadership cluelessness for years and years, see here and here.

Finally, here’s the charmingly-titled How to Surf the Web in 2025, and Why You Should. I love this piece.

Class Reductionism

The news keeps making me want to build something around the classreductionist.org domain name I’ve owned for years.

The tl;dr on Class Reductionism is something like “In the best possible world it’ll take generations to disassemble the global tangle of intersectional oppression, but we could treat the symptoms effectively right now this year by sending money to the poor. I’m talking about Universal Basic Income or suchlike. I wrote a couple thousand words on the subject back in 2023, and there are complexities, and I probably won’t put up that site. But I still do maintain that a very high proportion of our societal pain is rooted in the egregious inequality, and consequent poverty, that seems a baked-in feature of Late Capitalism.

Let’s start with Nobelist Paul Krugman, who’s been writing an “Understanding Inequality” series on his paywalled newsletter and then republishing a gratis version, start here. Very data-dense and educational. Hmm, that site is slow; there’s a livelier table of contents here.

Don’t kid yourself that this is just an American problem, see ‘The Better Life Is Out of Reach’: The Chinese Dream Is Slipping Away.

Let’s pull the impersonal veil of facts and figures aside and focus on the human experience of what we used to call Class Struggle. Confessions of the Working Poor is beautifully written and opened my eyes to lifestyle choices that I didn’t even know some people have to make.

But hey, there are people who are just fine with this: Delta's premium play is taking advantage of the growing economic split.

Look, being class-determinist-adjacent doesn’t mean you should ignore intersectional awfulness: What We Miss When We Talk About the Racial Wealth Gap.

No more sections

The remaining Long Links refused to be organized so I had to turn them loose; call it the Long Tail.

The Venetian origins of roman type. You might think you don’t care about typography but still enjoy the pictures and descriptions here.

This guy is a full-time Coyote researcher. What a great gig! I’m an admirer of those animals and how they’ve carved themselves a comfy niche in most of North America’s big cities. (Even if it means that you better not let your cat out at night.) They’re also remarkably attractive.

Here’s another long list of Long Links, and many of you will wonder why anyone would choose to browse it: The Best Camera Stores in Tokyo: The Ultimate Guide. Some of the interiors are remarkable.

Oh, while we’re on the subject of photography: A Photojournalist Took a Fujifilm Instax Camera to a Mexican Cartel Wedding.

GLP-1’s (i.e. Ozempic and friends) would probably dominate a large section of the news if weren’t for all the political craziness. Here’s one small example: How GLP-1s Are Breaking Life Insurance.

Science is hard. There are lots of largely-unsolved areas, and “gap-map.org” tries to organize them: Fundamental Development Gap Map v1.0. The UI is a little klunky but the thing still sucked me right in.

I’m going to give the last word to Laurie Penny. I don’t know what we’d do without her. In a time of monsters: do we have any ideas for surviving the zombie apocalypse that aren’t nightmare patriarchy?

De-Google Project Update

2025-07-30 03:00:00

I introduced this family project in the spring of 2024. I won’t reproduce those arguments for why we’re working on this, but in the current climate I feel like I hardly need to. Since that post, our aversion to Google dependency has only grown stronger. Progress has been non-zero but not fast.

Here’s the table, with progress notes below.

Need Supplier Alternatives
Office Google Workspace Proton?
Data sharing Dropbox
Photos Google Photos Dropbox?
Video meetings Google Meet Jitsi, Signal?
Maps Google Maps Magic Earth, Here, something OSM-based?
Browser Safari, Firefox, Vivaldi, LibreWolf
Search Google Bing-based options, Kagi?
Chat Signal
Photo editing Adobe Lightroom & Nik Capture One, Darktable, ?
In-car interface Google Android Auto Automaker software
Play my music Plex, USB
Discover music Qobuz
TV Roku, Apple, migration

Pink indicates a strong desire to get off the incumbent service, green means we’re happy-ish with what we’re using, and blue means that, happy or not, it’s not near the top of the priority list.

I’ll reproduce the metrics we care about when looking to replace Google products, some combination of:

  1. Not ad-supported

  2. Not VC-funded

  3. Not Google, Microsoft, or Amazon

The list used to include “Open Source” but I decided that while that’s good, it’s less important than the other three criteria.

Now let’s walk down the chart.

Office

This is going to be a wrenching transition; we’ve been running the family on Google stuff forever, and I anticipate muscle-memory pain. But increasingly, using Google apps feels like being in enemy territory. And, as I said last time, I will not be sorry to shake the dust of Google Drive and Docs from my heels, I find them clumsy and am always having trouble finding something that I know is in there.

While I haven’t dug in seriously yet, I keep hearing reasonably-positive things about Proton, and nothing substantive to scare me away. Wish us luck.

Data sharing (progress!)

Dropbox is, eh, OK. It doesn’t seem actively evil, there’s no advertising, and the price is low.

Photos

We’re a four-Android family including a couple of prolific photographers, and everything just gets pumped into Google and then it fills up and then they want more money. If we could configure the phones to skip Google and go straight to Dropbox, that would be a step forward.

Video meetings

Google meet isn’t painful but I totally suspect it of data-mining what should be private conversations. I’m getting the feeling that the technical difficulty of videoconferencing is going steadily down, so I’m reasonably optimistic that something a little less evil will come along with a fair price.

Maps

The fear and loathing that I started feeling in 2017 grows only stronger. But replacements aren’t obvious. It’s a pity, maps and directions and reviews feel like a natural monopoly that should be a public utility or something, rather than a corporate moat.

Browser (progress!)

Chrome has seriously started making my flesh crawl; once again, enemy territory. Fortunately, there are lots of good options. Even people like us who have multiple lives we need to keep separate can find enough better browsers out there.

Maybe I’ll have a look at one of the new genAI-company browsers ha ha just kidding.

Search

The reports on Kagi keep being positive and giving it a try is definitely on the To-Do list.

Chat

Signal is the only sane choice at this point in history for personal use.

Photo editing

Adobe’s products are good, and I’m proficient and happy with Lightroom, but they are definitely suffering from bad genAI craziness. Also the price is becoming unreasonable.

I’ve had a few Lightroom software failures in recent months and if that becomes a trend, looking seriously at the alternatives will move to the top of the priority list.

In-car interface

It’s tough, Android Auto is a truly great product. I think I’m stuck here for now, particularly given that I plan to be driving a 2019-model-year car for the foreseeable future. Also, it supports my music apps.

Discover music and play mine (progress!)

Progress here. I’ve almost completely stopped using YouTube Music in favor of Plex and Qobuz. Really no downside; YTM has more or less completely lost the ability to suggest good new stuff.

TV

Video continues morphing itself into Cable TV redux. We have an old Roku box that works fine and I think I’ve managed to find its don’t-spy-on-us settings. We’ll keep subscribing to Apple+ as long as they keep shipping great shows. I have zero regrets about having left Prime behind.

As for the rest, we’ve become migrants, exclusively month-at-a-time subscriptions for the purpose of watching some serial or sports league, unsubscribe after the season finale or championship game. The good news is that I haven’t encountered much friction in unsubscribing, just a certain amount of earnest pleading.

Looking forward

I have yet to confront any of the really hard parts of this project, but the sense of urgency is increasing. Let’s see.

QRS: Finite-state Struggles

2025-07-22 03:00:00

I just posted a big Quamina PR representing months of work, brought on by the addition of a small basic regular-expression feature. This ramble doesn’t exactly have a smooth story arc but I’m posting it anyhow because I know there are people out there interested in state-machine engineering and they are my people.

As far as I can tell, a couple of the problems I’m trying to solve haven’t been addressed before, at least not by anyone who published their findings. Partly because of that, I’m starting to wonder if all these disorderly Quamina postings might be worked into a small book or monograph or something. State machines are really freaking useful software constructs! So yeah, this is a war story not an essay, but if you like finite automata you’ll likely be interested in bits of it.

The story thus far

Prior to beginning work on Regular Expressions, I’d already wired shell-style “*” wildcards into Quamina, which forced me to start working with NFAs and ε-transitions. The implementation wasn’t crushingly difficult, and the performance was… OK-ish.

Which leads me to The Benchmark From Hell. I wondered how the wildcard functionality would work under heavy stress, so I pulled in a list of 12,959 five-letter strings from the Wordle source code, and inserted a “*” at a random position in each. Here are the first ten:

aalii*
*aargh
aar*ti
abaca*
a*baci
a*back
ab*acs
ab*aft
abak*a

I created an NFA for each and merged them together as described here. Building and merging the automata were plenty fast enough, and the merged NFA had 46,424 states, which felt reasonable. Matching strings against it ran at under ten thousand per second, which is pretty poor given that Quamina can do a million or two per second on patterns encoded in a DFA.

But, I thought, still reasonably usable.

The cursed “?

Last year, my slow grind through the regexp features had led me to the zero-or-one quantifier “?”. The state machine for these things is not rocket science; there’s a discussion with pictures in my recent Epsilon Wrangling.

So I implemented that and fired off the unit tests, most of which I didn’t have to write, and they all failed. Not a surprise I guess.

It turned out that the way I’d implemented ε-transitions for the wildcards was partially wrong, as in it worked for the tight-loop state-to-itself ε-transitions, but not for more general-purpose things like “?” requires.

In fact, it turns out that merging NFAs is hard (DFAs are easy), and I found precious little help online. Thompson’s construction does give an answer: Make an otherwise-empty state with two ε-transitions, one to each of the automata, and it’ll do the right thing. Let’s call that a “splice state”. It’s easy to implement, so I did. Splicing is hardly “merging” in the Quamina sense, but still.

Unfortunately, the performance was hideously bad, just a few matches per second while pegging the CPU. A glance at the final NFA was sobering; endless chains of splice states, some thousands long.

At this point I became very unhappy and got stalled for months dealing with real-life issues while this problem lurked at the back of my mind, growling for attention occasionally.

Eventually I let the growler out of the cave and started to think through approaches. But first…

Worth solving?

Is it, really? What sane person is going to want to search for the union of thousands of regular expressions in general or wild-carded strings in particular?

I didn’t think about this problem at all, because of my experience with Quamina’s parent, Ruler. When it became popular among several AWS and Amazon teams, people sometimes found it useful to match the union of not just thousands but a million or more different patterns. When you write software that anyone actually uses, don’t expect the people using it to share your opinions on what is and isn’t reasonable. So I wasn’t going to get any mental peace until I cracked this nut.

I eventually decided that three approaches were worth trying:

  1. Figure out a way really to merge, not just splice, the wildcarded patterns, to produce a simpler automaton.

  2. Optimize the NFA-traversal code path.

  3. Any NFA can be transformed into a DFA, says computer-science theory. So do that, because Quamina is really fast at DFA-based matching.

Nfa2Dfa

I ended up doing all of these things and haven’t entirely given up on any of them. The most intellectually-elegant was the transform-to-DFA approach, because if I did that, I could remove the fairly-complex NFA-traversal logic from Quamina.

It turns out that the Net is rich with textbook extracts and YouTubes and slide-shows about how to do the NFA-to-DFA conversion. It ended up being quite a pleasing little chunk of code, only a couple hundred lines.

The bad news: Converting each individual wildcard NFA to a DFA was amazingly fast, but then as I merged them in one by one, the number of automaton states started increasing explosively and the process slowed down so much that I never had the patience to let it finish. Finite-automata theory warns that this can happen, but it’s hard to characterize the cases where it does. I guess this one of them.

Having said that, I haven’t discarded the nfa2Dfa code, because perhaps I ought to offer a Quamina option to apply this if you have some collection of patterns that you want to run really super fast and are willing to wait for a while for the transformation process to complete. Also, I may have missed opportunities to optimize the conversion; maybe it’s making more states than it needs to?

Faster NFA traversal

Recently in Epsilon wrangling I described how NFA traversal has to work, relying heavily on implementing a thing called an ε-closure.

So I profiled the traversal process and discovered, unsurprisingly, that most of the time was going into memory allocation while computing those ε-closures. So now Quamina has an ε-closure cache and will only compute each one once.

This helped a lot but not nearly enough, and the profiler was still telling me the pain was in Go’s allocation and garbage-collection machinery. Whittling away at this kind of stuff is not rocket science. The standard Go trick I’ve seen over and over is to keep all your data in slices, keep re-using them then chopping them back to [:0] for each request. After a while they’ll have grown to the point where all the operations are just copying bytes around, no allocation required.

Which also helped, but the speed wasn’t close to what I wanted.

Merging wildcard automata

I coded multiple ways to do this, and they kept failing. But I eventually found a way to build those automata so that any two of them, or any one of them and a DFA, can merged and generate dramatically fewer ε-transition chains. I’m not going to write this up here for two reasons: First of all, it’s not that interesting, and second, I worry that I may have to change the approach further as I go on implementing new regxp operators.

In particular, at one point I was looking at the code while it wasn’t working, and I could see that if I added a particular conditional it would work, but I couldn’t think of a principled reason to do it. Obviously I’ll have to sort this out eventually. In the meantime, if you’re the sort of um special person who is now burning with curiosity, check out my branch from that PR and have a look at the spinout type.

Anyhow, I added that conditional even though it puzzled me a bit, and now you can add wildcard patterns to Quamina at 80K/second, and my 12.9K wildcards generate an NFA with with almost 70K states, which can scan events at almost 400K/second. And that’s good enough to ship the “?” feature.

By the way, I tried feeding that 70K-state automaton to the DFA converter, and gave up after it’d burned an hour of CPU and grown to occupy many GB of RAM.

Next steps

Add “+” and “*”, and really hope I don’t have to redesign the NFA machinery again.

Also, figure out the explanation for that puzzling if statement.

And I should say…

Despite the very narrow not to say obsessive focus of this series, I’ve gotten a few bits and pieces of positive feedback. So there are a few people out there who care about this stuff. To all of you, thanks.

Memory in Saskatchewan

2025-07-10 03:00:00

I just came back from Canada’s only rectangular province. I was there to help out my 95-year-old mother while her main caregiver took vacation. It’s an unhappiness that my family has splashed itself across Canada in such a way that we have to get on an airplane (or take drives measured in days) to see each other, but that’s where we are. I came back with pictures and stories.

Let me set the stage with a couple of photos. Everyone knows that Saskatchewan is flat and brown and empty, right?

Flowers, intensely colored in near-black purple and yellowTrees and lawns, behind a still body of water and somewhat reflected in it

Mom lives in Regina, the provincial capital, a city built round a huge park that contains the Legislature (the flowers are from its front lawn), a sizeable lake, and an artificial mini-mountain (the water and trees are from its tip). Have no fear, I’ll get to some no-kidding prairie landscapes.

Health-care drama

The night I arrived, after my Mom went to bed she got up again, tripped on something and fell hard. Her right arm was swollen, bruised, and painful. The skin and adjacent blood vessels of very old people become thin and fragile; her whole forearm was a bruise. I tried to get her to go to Emergency but she wasn’t having any of it: “You wait for hours and then they give you a pain-killer, which is constipating.” Since she could twist her wrist and wiggle her fingers and give my hand a firm grasp, I didn’t push too hard.

A couple days later on Saturday she got her regular twice-a-week visit from the public HomeCare nurse, a friendly and highly competent Nigerian immigrant, to check her meds and general condition. She looked at Mom’s wrist and said “Get her an appointment with her doctor, they’ll probably want an X-Ray.”

I called up her doctor at opening time Monday. The guy who answered the phone said “Don’t have any appointments for a couple weeks but come on over, we’ll squeeze her in.” So we went in after morning coffee and waited less than an hour. The doctor looked at her arm for 45 seconds and said “I’m writing a prescription for an X-Ray” and there was a radiologist around the corner and she was in ten minutes later. The doctor called me back that afternoon and said “Your mother’s got a broken wrist, I got her an 8AM appointment tomorrow at Regina General’s Cast Clinic.”

The doctor at the clinic looked at her wrist for another 45 seconds and said “Yeah, put on a cast” so they did and we were home by ten. I’d pessimistically overpaid a couple bucks for hospital parking.

The reason I’m including this is because I notice that this space has plenty of American readers. Did you notice that the story entirely omits insurance companies and money (except parking)? In Canada your health-care comes with your taxes (granted, higher than Americans’) and while the system is far from perfect, it can fix up an old lady’s broken wrist pretty damn fucking quick without any bureaucratic bullshit. Also, Canada spends a huge amount less per head on health-care than the US does.

And Mom told me not to forget that Saskatchewan is the birthplace of Canadian single-payer universal healthcare. Tommy Douglas, the Social Democrat who made that happen, has been named The Greatest Canadian.

Gentle surface

Oh, did I say “flat and brown and empty”? Wrong, wrong, and wrong. The Prairies, in Canada and the US too, have textures and colors and hills and valleys, it’s just that the slopes are gentle. There are really flat parts and they make farmers’ lives easier, but more or less every square inch that’s not a town or a park is farmed. I took Mom for a drive out in the country southeast of Regina, from whence these views:

A road leading slightly uphill, brilliant yellow canola on both sidesYellow canola flowers under a blue sky

Note that in both shots we’re looking up a gentle slope. In the second, there’s farm infrastructure on the distant horizon.
Also consider the color of the sky.

In Canada that yellow-flowering crop is called “Canola”, which Wikipedia claims refers to a particular cultivar of Brassica napus, commonly known as rapeseed or just rape, so you can see why when Canada’s agribiz sector wanted to position its oil as the thing to use while cooking they went for the cultivar not the species name. I’m old enough to remember when farmers still said just “rapeseed”. Hmm, Wikipedia also claims that the OED claims this: The term “rape” derives from the Latin word for turnip, rāpa or rāpum, cognate with the Greek word ῥάφη, rhaphe.

Let’s stick with canola.

Pixelated color

After I’d taken those two canola-field shots I pulled out my Pixel and took another, but I’m not gonna share it because the Pixel decided to turn the sky from what I thought was a complex and interesting hue into its opinion of “what a blue sky looks like” only this sky didn’t.

Maybe it’s just me, but I think Google’s camera app is becoming increasingly opinionated about color, and not in a good way. There are plenty of alternative camera apps, I should check them out.

In case it’s not obvious, I love photographing Saskatchewan and think it generally looks pretty great, especially when you look up. On the province’s license plates it says “Land of living skies”, and no kidding.

Saskatchewan’s living skiesSaskatchewan’s living skiesSaskatchewan’s living skies

The first two are from the park behind Mom’s place,
the third from that mini-mountain mentioned above.

Experience and memory

My Mom’s doing well for a nonagenerian. She’s smart. When I visited early last fall and we talked about the US election I was bullish on Kamala Harris’s chances. She laughed at me and said “The Americans won’t elect a woman.” Well then.

But she’s forgetful in the short term. I took her to the Legislature’s garden and to the top of the mini-mountain and for a drive out in the country and another adventure we’ll get to; she enjoyed them all. But maybe she won’t remember them.

“Make memories” they say, but what if you show someone you love a good time and maybe they won’t remember it the next day? I’m gonna say it’s still worthwhile and has a lesson to teach about what matters. There endeth the lesson.

The gallery

Indigenous people make up 17% of Regina’s population, the highest share in any significant Canadian city. By “indigenous” I mean the people that my ancestors stole the land from. It’s personal with me; Around 1900, my Dad’s family, Norwegian immigrants, took over some pretty great farmland southeast of Edmonton by virtue of “homesteading”, such a nice word isn’t it?

Regina tries to honor its indigenous heritage and my favorite expression of that is its Mackenzie Art Gallery, a lovely welcoming space in the T.C.Douglas building (for “T.C.” read “Tommy”. (Did I mention him?) Mom and I walked around it and had lunch in its very decent café.

Every time I’ve been there the big exhibitions in the big rooms have been indigenous-centered, and generally excellent. I try to go every time I visit and I’ve never been disappointed.

Indigenous art at Regina’s Mackenzie Gallery

In 2025, anything I have to say about this piece would be superfluous.

Every American Flag Is A Warning Sign

I love modern-art galleries, especially with big rooms full of big pieces, even if I don’t like all the art. Because it feels good to be in the presence of the work of people who are pouring out what they have to offer, especially at large scale. If the task wasn’t hard enough that failures are common then it wouldn’t be worthwhile, would it?

They’re especially great when there’s someone I love there enjoying it with me. Here’s Mom.

Jean Bray considers indigenous art

These days, any visit might be the last. I hope this wasn’t.

QRS: Epsilon Wrangling

2025-07-08 03:00:00

I haven’t shipped any new features for Quamina in many months, partly due to a flow of real-life distractions, but also I’m up against tough performance problems in implementing Regular Expressions at massive scale. I’m still looking for a breakthrough, but have learned things about building and executing finite automata that I think are worth sharing. This piece has to do with epsilons; anyone who has studied finite automata will know about them already, but I’ll offer background for those people to skip.

I’ve written about this before in Epsilon Love. A commenter pointed out that the definition of “epsilon” in that piece is not quite right per standard finite-automata theory, but it’s still a useful in that it describes how epsilons support constructs like the shell-style “*”.

Background

Finite automata come in two flavors: Deterministic (DFA) and Nondeterministic (NFA). DFAs move from state to state one input symbol at a time: it’s simple and easy to understand and to implement. NFAs have two distinguishing characteristics: First, when you’re in a state and an input symbol arrives, you can transfer to more than one other state. Second, a state can have “epsilon transitions” (let’s say “ε” for epsilon), which can happen any time at all while you’re in that state, input or no input.

NFAs are more complicated to traverse (will discuss below) but you need them if you want to implement regular expressions with . and ? and * and so on. You can turn any NFA into a DFA, and I’ll come back to that subject in a future piece.

For implementing NFAs, I’ve been using Thompson's construction, where “Thompson” is Ken Thompson, co-parent of Unix. This technique is also nicely described by Russ Cox in Regular Expression Matching Can Be Simple And Fast. You don’t need to learn it to understand this piece, but I’ll justify design choices by saying “per Thompson”.

I’m going to discuss two specific issues today, ε-closures and a simpler NFA definition.

ε-closures

To set the stage, consider this regexp: A?B?C?X

It should match “X” and “BX” and “ACX” and so on, but not “CAX” or “XX”. Thompson says that you implement A? with a transition to the next state on “A” and another ε-transition to that next state; because if you see an “A” you should transition, but then you can transition anyhow even if you don’t.

The resulting NFA looks like this:

NFA matching A?B?C?X

In finite-automaton math, states are usually represented by the letter “q” followed by a number (usually italicized and subscripted, like q0, but not here, sorry). Note q4’s double circle which means it’s a goal state, i.e. if we get here we’ve matched the regexp. I should add that this was produced with draw.io, which seems to make this sort of thing easy.

Back to that NFA

So, here’s a challenge: Sketch out the traversal code in your head. Think about the input strings “AX” and “BCX” and just “X” and how you’d get through the NFA to the Q4 goal state.

The trick is what’s called the ε-closure. When you get to a state, before you look at the next input symbol, you have to set up to process it. In this case, you need to be able to transition on an A or B or C. So what you do is pull together the start state q0 and also any other states you can reach from there through ε-transitions. In this case, the ε-closure for the start state is {q0, q1, q2, q3}.

Suppose, then, that you see a “B” input symbol. You apply it to all the states in the ε-closure. Only q1 matches, transitioning you to q2. Before you look at the next input symbol, you compute the ε-closure for q2, which turns out to be {q2, q3}. With this ε-closure, you can match “C” or “X”. If you get a “C”, you”ll step to q3, whose ε-closure is just itself, because “X” is the only path forward.

So your NFA-traversal algorithm for one step becomes something like:

  1. Start with a list of states.

  2. Compute the ε-closure of that list.

  3. Read an input symbol.

  4. For each state in the ε-closure, see if you can traverse to another state.

  5. If so, add it to your output list of states.

  6. When you’re done, your output list of states is the input to this algorithm for the next step.

Computation issues

Suppose your regular expression is (A+BC?)+. I’m not going to sketch out the NFA, but just looking at it tells you that it has to have loopbacks; once you’ve matched the parenthetized chunk you need to go back to a state where you can recognize another occurrence. For this regexp’s NFA, computing the ε-closures can lead you into an infinite loop. (Should be obvious, but I didn’t realize it until after the first time it happened.)

You can have loops and you can also have dupes. In practice, it’s not that uncommon for a state to have more than one ε-transition, and for the targets of these transitions to overlap.

So you need to watch for loops and to dedupe your output. I think the only way to avoid this is with a cookie-crumbs “where I’ve been” trail, either as a list or a hash table.

Both of these are problematic because they require allocating memory, and that’s something you really don’t want to do when you’re trying to match patterns to events at Quamina’s historic rate of millions per second.

I’ll dig into this problem in a future Quamina-Diary outing, but obviously, caching computed epsilon closures would avoid re-doing this computation.

Anyhow, bear ε-closures in mind, because they’ll keep coming up as this series goes on.

And finally, simplifying “NFA”

At the top of this piece, I offered the standard definition of NFAs: First, when you’re in a state and an input symbol arrives, you can transfer to more than one other state. Second, you can have ε-transitions. Based on my recent work, I think this definition is redundant. Because if you need to transfer to two different states on some input symbol, you can do that with ε-transitions.

Here’s a mini-NFA that transfers from state q0 on “A” to both q1 and q2.

An NFA transferring to two different states on an input symbol

And here’s how you can achieve the same effect with ε-transitions:

Transferring to two destinations using ε-transitions

In that NFA, in qS the “S” stands for “splice”, because it’s a state that exists to connect two threads of finite-automaton traversal.

I’m pretty sure that this is more than just a mathematical equivalence. In my regexp implementation, so far at least, I’ve never encountered a need to do that first kind of dual transition. Furthermore, the “splice” structure is how Thompson implements the regular-expression “|” operator.

So if you’re building an NFA, all the traversal stuff you need in a state is a simple map from input symbol to next state, and a list of ε-transitions.

Next up

How my own implementation of NFA traversal collided head-on into the Benchmark From Hell and still hasn’t recovered.

The Real GenAI Issue

2025-07-07 03:00:00

Last week I published a featherweight narrative about applying GenAI in a real-world context, to a tiny programming problem. Now I’m regretting that piece because I totally ignored the two central issues with AI: What it’s meant to do, and how much it really costs.

What genAI is for

The most important fact about genAI in the real world is that there’ve been literally hundreds of billions of dollars invested in it; that link is just startups, and ignores a comparable torrent of cash pouring out of Big Tech.

The business leaders pumping all this money of course don’t understand the technology. They’re doing this for exactly one reason: They think they can discard armies of employees and replace them with LLM services, at the cost of shipping shittier products. Do you think your management would spend that kind of money to help you with a quicker first draft or a summarized inbox?

Adobe said the quiet part out loud: Skip the Photoshoot.

At this point someone will point out that previous technology waves have generated as much employment as they’ve eliminated. Maybe so, but that’s not what business leaders think they’re buying. They think they’re buying smaller payrolls.

Maybe I’m overly sensitive, but thinking about these truths leads to a mental stench that makes me want to stay away from it.

How much does genAI cost?

Well, I already mentioned all those hundreds of billions. But that’s pocket change. The investment community in general and Venture Capital in particular will whine and moan, but the people who are losing the money are people who can afford to.

The first real cost is hypothetical: What if those business leaders are correct and they can gleefully dispose of millions of employees? If you think we’re already suffering from egregious levels of inequality, what happens when a big chunk of the middle class suddenly becomes professionally superfluous? I’m no economist so I’ll stop there, but you don’t have to be a rocket scientist to predict severe economic pain.

Then there’s the other thing that nobody talks about, the massive greenhouse-gas load that all those data centers are going to be pumping out. This at a time when we we blow past one atmospheric-carbon metric after another and David Suzuki says the fight against climate change is lost, that we need to hunker down and work on survival at the local level.

The real problem

It’s the people who are pushing it. Their business goals are quite likely, as a side-effect, to make the world a worse place, and they don’t give a fuck. Their technology will inevitably worsen the onrushing climate catastrophe, and they don’t give a fuck.

It’s probably not as simple as “They’re just shitty people” — it’s not exactly easy to escape the exigencies of modern capitalism. But they are people who are doing shitty things.

Is genAI useful?

Sorry, I’m having trouble even thinking about that now.