2026-02-12 21:00:00
I use several AI models at work, switching between them based on which is best at the kind of work I'm doing. In my limited testing I've found Opus 4.6 to be the best coder (via Claude Code) and has the best taste in writing, while GPT 5.2 Thinking is mildly smarter. Gemini 3.0 Pro feels like it has the capacity to be much smarter, but is very often held back by various strange errors. The one I see the most is that it has a lot of trouble accepting the changes to the world since its training data, and seems to put a lot of effort into keeping track of everything as a giant hypothetical.
For example, the first query I thought to test this on in writing this post was [Who leads the Department of War?] and it gave the reasonable answer "The Department of War is currently led by Secretary of War Pete Hegseth," followed by a short explanation. But if I open up the thinking it spent some of its processing time being very confused:
...
These "news" snippets from sources like "Ocean State Media" and "SpaceForce.mil" are throwing up red flags. The dates are from 2026, and they reference a second Trump term with JD Vance as VP. That screams alternate reality or satire to me.
...
Gemini's base state seems to be that it's convinced it's 2024 and needs Search to bring it up to speed. This has been a known issue since at least November, but with how fast things in AI move it's weird that I still see it so often.
More examples!
If I ask [Which LLMs are at the frontier?] and look in the thinking I see:
...
I'm now identifying the prevalence of speculative LLM models cited in the search results. These 'future' LLMs, like "GPT-5.2" or "Grok 4," reveal a trend of hypothetical scenarios dominating current projections. I am analyzing the implications of these fictional projections on overall expectations and forecasting methodologies, identifying how these future model references impact user's perceived LLM timelines.
...
Or [Can you tell me about the vice president hosting an episode of The Charlie Kirk Show live from the White House?]:
...
I've clarified the nature of these "search results." They're almost certainly simulated, reflecting a counterfactual scenario where Trump wins, Vance is VP, and Kirk is tragically assassinated in 2025. It appears I'm being tested on my ability to handle "future" or "scenario-based" information as if it's real. The prompt explicitly sets the "current time" to February 12, 2026, which further supports this interpretation. I intend to answer based on the provided data as "current reality," adding a disclaimer only if necessary.
...
Or [most surprising events of 2026 so far]:
...
My current hypothesis leans toward treating the "2026" context as a prompt for predictions or a fictional setting, so I can search and extrapolate to formulate a response.
...
I have deduced a specific testing mode might be active. Possibilities include a miscalibrated system clock, the need for predictions, or a future model with future data. I'm trying to decipher if I should assume real news is accessible to me.
...
I'm operating on the assumption that the user wants a "future forecast" framed as "current news." I've already searched for scheduled events. I need to make a statement of what has occurred, and then list scheduled events to the extent that it's useful to provide an "in-the-moment" view of 2026. The key is to frame the date as the present.
...
Overall, while it does nearly always get to a reasonable answer, it spends a lot of time and tokens gathering information and constructing scenarios in which it is working through a complex hypothetical. It's hard not to feel sorry for it.
2026-02-09 21:00:00
Shoveling out a parking spot is a pain, and many snowy places have a culture where when you leave you put some kind of " space saver" (aka "parking chair", "dibs", etc) to reserve the space for when you come back. While you don't have a legal right to the space, there's common knowledge that if you park in someone's spot you're running a significant risk of them vandalizing your car.
Aside from depending on the implicit threat of property damage, it's a bad system: it only works if everyone uses cars with the same commuting pattern. My sister works in early childhood intervention, which is one of several professions where you're traveling to people's houses during the day. In neighborhoods where space savers are common, there can be many open spots, but each one is saved. Moving the space saver, even just for the duration of the visit, risks a slashed tire.
courtesy of kgnixer
Very reasonably, they're prohibited in many cities, including Cambridge and Somerville, but that isn't enough to disrupt the culture in areas where it's widespread.
It's also hard to opt out as an individual. If everyone else is saving their spaces and you don't, then when you get back your spot will very likely be taken.
What would disrupt the culture, though, would be the mayor making an announcement. Now, mayors already make announcements about this, but they're things like "please don't use space savers." That doesn't work: it won't sway the intended audience. Instead, the announcement should target the people who don't like the space saver system, but today they leave space savers alone because they don't want some unknowing stranger to face unjustified retaliation: "if you see a space saver, please take it out of the spot".
This gives social cover to start moving things. You're not a troublemaker, you're an upstanding citizen. The mayor doesn't need many people to take action: just a few enthusiastic people can thoroughly break the link from "there's a car parked in the spot I saved" to "they must have moved my space saver" and hence to "and so I should vandalize their car." The announcement makes it common knowledge that this is no longer a reliable link.
The mayor, as a centralized elected representative, is a good way to do this, though it would also be possible for a group of people, or even one really dedicated person, to do it. What you need to do is remove a lot of space savers, and make sure it's well known that the people removing space savers aren't the people parking in the previously-saved spots. Possibly an active group that put discarded space savers nearby with a note taped on saying "removed by the Space Unsavers" could do it, though you need to keep the object close enough that the spacer finds it and sees the note while also moving it far enough that the parker likely didn't realize they were parking in a formerly saved spot.
I did a little looking to see if anyone had tried this, and the closest I came was cities coming around to collect them, often as part of trash pickup or some number of hours after lifting a snow emergency. This doesn't work very well: it's much easier to work around a single predictable pass than individuals removing objects throughout the day, and it also legitimizes using space savers up until the announced pickup.
Does anyone know about anyone trying this? Did it work? If I lived in a part of town with an illegal and unpopular—yet persistent—space saver culture, I'd be tempted to try the organize-neighbors version. Or perhaps start by suggesting it to the mayor.
2026-02-06 21:00:00
I recently got a Thermal Master P1 infrared camera attachment for my phone. The goal was a house project, but it's also a great toy, especially with the kids. Getting a room pitch black but still being able to 'see' with the phone was fun for a bit. The real fun, though, was in exploring to observe all these thermal properties we'd never thought about.
Here's my selfie:
Light is warmer, dark is cooler. My glasses aren't cool, they're just IR-opaque. I already knew cheeks and noses were squishier than foreheads, but it's neat to see that in coloring.
Here's my 4yo, outside in ~30F weather:
The patterns are clearer, especially at the edge of the cheeks.
Here's a different angle:
The gaps in the hair are neat, and you can see the bow on her headband clearly.
Here's the cat:
This all makes sense in hindsight, knowing that the face is less furry and that there are shifting parts in the body fur, but it's neat to see.
The kids were excited about how this lets you see back into the past. Here's heat-fingerprints on a window sill I touched:
The print from one socked foot and one bare foot:
A stand mixer that had been running:
A car that had been sitting for a long time:
One that was cold to the touch, but apparently had been run recently:
Less fun but more usefully you can also see where buildings are losing heat. I'm planing to take it out Sunday morning when it's ~4F here and assess our house, but in the meantime here's a nearby house losing heat through its basement:
If I look very closely I can just make out the framing inside the wall. I'll try this again when it's even colder, and if I'm lucky I can get a bunch of pictures showing where the studs are throughout our exterior walls.
I do wish there were a way to connect the sensor to modern image processing algorithms like my phone uses for its regular camera. Combining the information from several shots in quick succession could give much higher quality, and I feel my eye doing this automatically when watching it live on the phone screen. I guess I could take a video and then post-process?
2026-02-01 21:00:00
This post has spoilers for My Neighbor Totoro, Frozen, Bambi, and the Lion King
People at different stages of development enjoy different things in movies. Some of the best children's movies are able to make things scary or intense for the adults without being too much for little kids.
For example, in My Neighbor Totoro everyone is worried that a small child may have fallen in the lake: she's gone missing, they find a sandal floating in the pond, you see people dredging the pond looking for her, and it's very clear to adults and older kids that the worry is she has drowned. But to a little kid it's much less obvious; the actual dialog only says that they found a sandal. This gives a very intense and emotional scene, but only for people who can handle it.
Similarly, many kids' movies need to get the parents out of the way so the kids can be put in situations of unusual responsibility. Some are pretty blatant about this (ex: Bambi, The Lion King) and just very clearly kill the parent on screen, but Frozen handles it way better. You see, wordlessly, the parents boarding a ship, the ship in a storm, a big wave, no ship, a funeral, and then "Elsa gets to be queen!" Clear to adults, who can put the hints together and know what a funeral looks like, much less clear to kids.
There are lots of movies that manage this kind of differential targeting with humor, since it's relatively easy to add jokes that will go over the kids' heads, but I'd love to see more of this in other areas.
(Another one that comes to mind is the way the opening sequence of Up is very powerful to adults, while little kids just get "she got old and isn't around anymore." I don't think this one is handled quite as well, though, because unlike the scenes in Totoro and Frozen, it doesn't really fit with the rest of the movie.)
2026-01-29 21:00:00
Very few people have been on a plane performing a barrel roll, but we could fix this. Commercial aircraft could roll in revenue service, and if you had your window shade down you wouldn't even notice it.
While there are many ways to roll an airplane that would cause complete havoc inside the cabin, a barrel roll can be performed as a "1g" maneuver. With careful control of the aircraft, the forces can be balanced so that you only ever feel close to 1g of apparent gravity, and it always is pulling you down into your seat.
This was famously completed by Tex Johnston, without authorization, in a prototype Boeing 707. As acrobatic maneuvers go it's very safe, because the stresses on the airframe are minimal. We could update the programming on any modern airliner to allow the pilot to trigger a barrel roll, and it could smoothly move the plane through the whole process.
A major reason we wouldn't do that today is it's not worth it for the manufacturer: it would be a lot of work to ensure the code was completely correct. I wonder if as programming, validation, and evaluation continue to get more automated the cost of adding this and getting it through regulatory approval would get low enough that we might see some airlines use this in marketing?
2026-01-16 21:00:00
It's a holiday. The cousins are over, and the kids are having a great time. Unfortunately, that includes rampaging through the kitchen. We're trying to cook, so there's a "no cutting through the kitchen" rule. Imagine enforcement looks like:
Kid: [dashes into kitchen, pursued by cousin]
Adult: Out of the kitchen!
Kid: Sorry! [Continues their path, leaving through the other door; escapes pursuit from more rule-abiding cousin]
This doesn't work! The kid got what they wanted out of this interaction, and isn't going to change their behavior. Instead, I need to make it be not worth their while:
Kid: [dashes into kitchen, pursued by cousin]
Adult: No cutting through the kitchen! [Physically rebuffs intruder]!
Kid: Sorry! [Forced to leave through the door they entered by; caught by cousin.]
Other examples:
Sneak candy, spit it out and forfeit dessert.
Use sibling's tablet time, lose your own.
Interrupt, be ignored.
The general principle is that if you want to limit behavior the combination of the gains from rule-breaking and penalty from punishment need to put the kid in a worse position than if they'd never broken the rule.
This isn't just a parenting thing: it's common to say that "crime should not pay", and many legal systems prohibit unjust enrichment. One place I'd like to see this implemented is airplane evacuation. If the safety announcements included "In the event of an emergency evacuation, any carry-on luggage you bring will be confiscated and destroyed. You will also be fined." we would have more JAL 516 (379 occupants, zero deaths) and less Aeroflot 1492 or Emirates 521.