2026-03-19 23:30:59
Every time we run an UPDATE statement in a database, something disappears. The old value, whatever was there a moment ago, is gone.
In fact, most databases are designed to forget. Every UPDATE overwrites what came before, every DELETE removes it entirely, and the application is left with only a snapshot of the present state. We accept this as normal because it’s the most natural way to think about things.
But what if your system needs to answer a different kind of question: not just “what is the current state?” but “how did we get here?”
That’s the question Event Sourcing is built to answer. And the solution is both more rewarding and more demanding than it first appears. In this article, we will look at Event Sourcing along with its benefits and trade-offs.
2026-03-18 23:30:41
Only 35% of engineering leaders report significant ROI from AI, and most ROI models miss the full picture.
The majority of engineering time is spent on investigating alerts, diagnosing incidents, and coordinating decisions across tools that don’t share context. The cost of that work rarely appears in ROI models.
When organizations only measure what it costs to produce code, they’re missing the downstream costs that pop up in production.
Learn how engineering teams at Zscaler, DoorDash, and Salesforce are measuring AI ROI across the full engineering lifecycle and finding the largest returns in production.
When OpenAI shipped Codex, their cloud-based coding agent, the hardest problems they had to solve had almost nothing to do with the AI model itself.
The model, codex-1, is a version of OpenAI’s o3 fine-tuned for software engineering. It was important, but it was also just one component in a much larger system. The real engineering went into everything around it.
How do you assemble the right prompt from five different sources? What happens when your conversation history grows so large it threatens to exceed the model’s memory? How do you make the same agent work in a terminal, a web browser, and three different IDEs without rewriting it each time?
When the Codex team needed their agent to work inside VS Code, they first tried the obvious approach and exposed it through MCP, the emerging standard for connecting AI models to tools. It didn’t work. The rich interaction patterns that a real agent needs, things like streaming progress, pausing mid-task for user approval, and emitting code diffs, didn’t map cleanly to what MCP offered. So the team built a new protocol from scratch.
In this article, we will look at how OpenAI built the right orchestration layer around the model.
Disclaimer: This post is based on publicly shared details from the OpenAI Engineering Team. Please comment if you notice any inaccuracies.
Codex is a coding agent that can write features, fix bugs, answer questions about your codebase, and propose pull requests.

Each task runs in its own isolated cloud sandbox, preloaded with your repository. You can assign multiple tasks in parallel and monitor progress in real time.
How Codex works behind the scenes is also quite interesting. The system has three layers worth understanding: the agent loop, prompt and context management, and the multi-surface architecture that lets one agent serve many different interfaces.
At the heart of Codex is something called the agent loop. The agent takes user input, constructs a prompt, sends it to the model for inference, and gets back a response.
However, that response isn’t always a final answer. Often, the model responds with a tool call instead, something like “run this shell command and tell me what happened.” When that happens, the agent executes the tool call, appends the output to the prompt, and queries the model again with this new information. This cycle repeats, sometimes dozens of times, until the model finally produces a message for the user.
See the diagram below:
What makes this more than a simple loop is everything the harness manages along the way.
Codex can read and edit files, run shell commands, execute test suites, invoke linters, and run type checkers. A single user request like “fix the bug in the auth module” might trigger the agent to read several files, run the existing tests to see what fails, edit the code, run the tests again, fix a linting error, and run the tests one more time before producing a final commit.
The model does the reasoning at each step, but the harness handles everything else, such as executing commands, collecting outputs, managing permissions, and deciding when the loop is done.
This distinction between model and harness matters because it shapes how developers actually use Codex. OpenAI’s own engineering teams use it to offload repetitive, well-scoped work like refactoring, renaming, writing tests, and triaging on-call issues.
The agent loop also has an outer layer. Each cycle of inference and tool calls constitutes what OpenAI calls a “turn.” However, conversations don’t end after one turn. When the user sends a follow-up message, the entire history of previous turns, including all the tool calls and their outputs, gets included in the next prompt. This is where things get expensive, and where the next layer of complexity kicks in.
See the diagram below:
When you type a request into Codex, your message becomes the bottom layer of a much larger prompt. Above it, the system stacks environment context like your current working directory and shell, the contents of any AGENTS.md files in your repository (these are project-specific instructions for the agent, covering things like coding conventions and which test commands to run), sandbox permission rules, developer instructions from configuration files, model-specific instructions, tool definitions, and a system message.

Each layer carries a role, either system, developer, or user, that signals its priority to the model. The server controls the ordering of the top layers. The client controls the rest. This layered construction means the model always has rich context about the environment it’s operating in. However, it also means the prompt is already large before the user says a single word. And it only grows from there.
Every tool call the model makes produces output that gets appended to the prompt. Every new conversation turn includes the full history of all previous turns, tool calls included.
See the diagram below:

This means that the total JSON sent to the API over the course of a conversation grows quadratically. If the first turn sends X amount of data, the second turn resends all of X plus the new data, the third turn resends all of that plus more, and so on.
OpenAI accepts this cost on purpose. They could use a server-side parameter that lets the API remember previous conversation state, avoiding the need to resend everything. They chose not to because doing so would break the statelessness of each request and prevent support for customers who require Zero Data Retention. Therefore, every request is self-contained and carries the full conversation with it.
The key mitigation is prompt caching. Since Codex always appends new content to the end of the existing prompt, the old prompt is always an exact prefix of the new one. This prefix property lets OpenAI reuse computation from previous inference calls, so even though the data transfer is quadratic, the actual model computation stays closer to linear.
However, the prefix property is fragile. Anything that changes the beginning or middle of the prompt, like switching models, changing tools, or altering sandbox configuration, breaks the cache. When OpenAI added support for MCP tools, they accidentally introduced a bug where the tools weren’t listed in a consistent order between requests. That inconsistency alone was enough to destroy cache hits.
Eventually, even with caching, conversations hit the context window limit, the maximum number of tokens the model can process in a single inference call. When that happens, Codex compacts the conversation. It replaces the full history with a smaller, representative version that preserves the model’s understanding of what happened through an encrypted payload that carries the model’s latent state. In reality, the compaction mechanism involves more nuance than a simple summary, but the core idea stands: managing the context window is a first-class engineering problem, not an afterthought.
AGENTS.md files deserve a quick mention here because they represent a design decision about where context should live. Rather than hardcoding project-specific knowledge into the system, OpenAI lets developers place AGENTS.md files in their repositories, right alongside their code. These files tell Codex how to navigate the codebase, which commands to run for testing, and how to follow the project’s conventions. The model performs better with them, but also works without them.
Codex started life as a CLI tool. You ran it in your terminal, and it operated on your local codebase.
Then OpenAI needed it in VS Code and then in a web app. Further, it was also needed as a macOS desktop app. Lastly, third-party IDEs like JetBrains and Xcode wanted to integrate it as well. Rewriting the agent logic for every surface was not an option.
As mentioned earlier, the first attempt was to expose Codex as an MCP server. However, the team found that MCP’s semantics couldn’t carry the full weight of what an agent conversation actually looks like. Codex needed to stream incremental progress as the model reasoned. It needed to pause mid-task and ask the user for approval before running certain commands. It needed to emit structured diffs. These interaction patterns were too rich for what MCP offered at the time.
So they built the App Server. All of the core agent logic, the agent loop, thread management, tool execution, configuration, and authentication live in a single codebase that OpenAI calls “Codex core.” The App Server wraps this core in a JSON-RPC protocol that any client can speak over standard input/output.
The protocol is fully bidirectional:
The client can send requests to the server (start a thread, submit a task).
The server can also send requests back to the client, for example, asking for approval before executing a shell command.
The agent’s turn pauses until the user responds with “allow” or “deny.” This lets the agent balance autonomy with human oversight without hardcoding that policy into the agent loop itself.
Different places use this architecture differently
The VS Code extension and the desktop app bundle the App Server binary, launch it as a child process, and keep a bidirectional stdio channel open.
The web app runs the App Server inside a cloud container. A worker provisions the container with the checked-out repository, launches the binary, and streams events to the browser over HTTP. State lives on the server, so work continues even if the user closes the tab.
Partners like Xcode decouple their release cycles from OpenAI’s by keeping their client stable and pointing it at newer App Server binaries as they become available. The protocol is designed to be backward compatible, so older clients can safely talk to newer servers.
This architecture wasn’t planned from the start. It evolved from a CLI, through a failed MCP attempt, to the App Server protocol that now underpins every Codex surface. That trajectory is itself a useful lesson about system design: the right abstraction usually doesn’t exist until you’ve tried the wrong one.
OpenAI’s experience proves that the model is a component and the agent is the system. Most of the engineering is in the system.
If you use tools like Codex, understanding these mechanics helps you use them more effectively. Writing clear AGENTS.md files gives the agent project-specific context that meaningfully improves its output. Scoping tasks tightly works better than vague, open-ended requests because the agent loop is most effective when each cycle has a clear next step. And knowing that long conversations degrade due to context window limits and compaction explains why starting fresh threads for new tasks often gives better results.
Codex still has real constraints. It can’t accept image inputs for frontend work. You can’t course-correct the agent mid-task. Delegating to a remote agent takes longer than interactive editing, and that shift in workflow takes getting used to. OpenAI is working toward a future where interacting with Codex feels more like asynchronous collaboration with a colleague, but the gap between that vision and the current product is still significant.
References:
2026-03-17 23:30:44
Code reviews are critical but time-consuming. CodeRabbit acts as your AI co-pilot, providing instant Code review comments and potential impacts of every pull request.
Beyond just flagging issues, CodeRabbit provides one-click fix suggestions and lets you define custom code quality rules using AST Grep patterns, catching subtle issues that traditional static analysis tools might miss.
CodeRabbit reviews 1 million PRs every week across 3 million repositories and is used by 100 thousand Open-source projects.
CodeRabbit is free for all open-source repo’s.
The Reddit Engineering Team completed one of the most demanding infrastructure migrations in the company’s history. It moved its entire Apache Kafka fleet, comprising over 500 brokers and more than a petabyte of live data, from Amazon EC2 virtual machines onto Kubernetes.
The migration was done with zero downtime and without asking a single client application to change how it connected to Kafka.
In this article, we will look at the breakdown of this migration, the challenges the engineering team faced, and how they achieved their goal of a successful migration.
Disclaimer: This post is based on publicly shared details from the Reddit Engineering Team. Please comment if you notice any inaccuracies.
To put things into perspective, let us first understand what exactly Apache Kafka is.
Apache Kafka is an open-source message streaming platform. Applications called producers write messages into Kafka partitions, and other applications called consumers read those messages out. Kafka sits in the middle and stores those messages reliably, even if the producer and consumer are running at completely different times. A single Kafka server is called a broker, whereas a collection of brokers working together forms a cluster.
At Reddit, Apache Kafka is not a peripheral tool. It sits underneath hundreds of business-critical services, processing tens of millions of messages every second. If Kafka went down, large portions of Reddit would break.
Before the migration, Reddit managed its Kafka brokers on Amazon EC2 instances using a combination of Terraform, Puppet, and custom scripts. Operators handled upgrades, configuration changes, and machine replacements by running commands directly from their laptops. This worked fine until a certain point. However, as the fleet grew, it became increasingly slow, error-prone, and expensive. Reddit needed a more scalable and reliable way to operate Kafka.
Kubernetes, paired with a tool called Strimzi, offered that path.
Kubernetes is an open-source platform for running and managing containerized applications. Instead of manually provisioning and maintaining individual servers, Kubernetes lets developers describe what should be running and handles deployment, scaling, and recovery automatically. Strimzi, on the other hand, is a project under the Cloud Native Computing Foundation that specifically lets you run Kafka on Kubernetes. It provides a declarative way to manage Kafka clusters. This means that developers can describe what they want in a configuration file, and Strimzi handles deployment, upgrades, and maintenance. This promised fewer manual interventions and more predictable operations.
Reddit did not jump straight into moving brokers. Before writing a single line of migration code, Reddit identified four hard constraints that ruled out entire categories of approaches. The constraints are as follows:
Kafka had to stay up. There was no acceptable maintenance window. Downtime, data loss, or forcing client applications to change their configuration was not an option. This ruled out scheduled cutovers, dual-write strategies, and replay-based migrations.
Kafka’s metadata could not be rebuilt from scratch. Apache Kafka maintains a detailed internal state called metadata. This includes information about which brokers exist, which broker holds which data, and where replicas of that data are stored. ZooKeeper, an external service, was responsible for managing this metadata. There is no supported way to recreate this metadata on a fresh cluster while keeping the system available. New brokers had to join the existing cluster rather than replace it.
Client connectivity was tightly coupled to specific brokers. Over time, applications across Reddit had been configured to connect directly to specific broker hostnames, typically the first few brokers in a cluster, rather than using a single load-balanced endpoint. Turning off those brokers would immediately break hundreds of services. Reddit did not control the layer through which clients found and connected to Kafka.
Every step had to be reversible. No single action during the migration could leave the system in a state from which recovery was impossible. This meant Reddit had to accept a long period where EC2 brokers and Kubernetes brokers ran side by side, and it meant that riskier changes had to wait until everything else was stable.
The first phase of the migration did not touch Kafka at all.
Reddit introduced a DNS facade, which is a set of DNS records that act as an intermediate layer between client applications and the actual Kafka brokers. DNS is the system that translates human-readable names into the addresses of servers. By creating new, infrastructure-controlled DNS names that initially pointed to the same EC2 brokers, Reddit changed nothing from the perspective of client applications.
Reddit then rolled out these new connection strings across more than 250 services using automated tooling that generated batch pull requests to update configuration files. Once all clients were talking through this DNS layer, Reddit could change where those names pointed, from EC2 to Kubernetes, without modifying any client code.
Each Kafka broker is identified by a unique numeric ID. Strimzi assigns broker IDs starting at 0 by default. However, Reddit’s existing EC2 brokers already occupied those low numbers.
To free up that ID space, Reddit doubled the cluster size by adding new EC2 brokers with higher IDs, then terminated the original low-numbered brokers. This shifted all data onto the higher-numbered brokers and opened up IDs 0, 1, 2, and so on for Strimzi-managed brokers to use.
See the diagram below:
This was the most technically complex phase.
Reddit needed Strimzi brokers running on Kubernetes to join the same cluster as the existing EC2 brokers and communicate with them directly. Strimzi does not support this out of the box, so Reddit created a fork of the Strimzi operator. The changes Reddit made were deliberately small and targeted:
The inter-broker listener configuration was set to use plaintext listeners accessible from both EC2 and Kubernetes, ensuring brokers in different environments could talk to each other.
The ZooKeeper connection was pointed at Reddit’s existing EC2-hosted ZooKeeper, so that both old and new brokers shared the same metadata store and were part of the same logical cluster.
The Cruise Control topic was overridden to stay consistent across both broker sets, allowing Reddit to use Cruise Control to move data between EC2 and Kubernetes brokers. Cruise Control is a Kafka tool that automates the process of rebalancing data across brokers in a controlled, measured way. It was central to the actual movement of data during the migration.
Running a forked operator in production carries risk. Reddit kept the scope of changes narrow and planned from the start to switch back to the standard Strimzi operator once the migration was complete.
With both sets of brokers running inside the same cluster, Reddit used Cruise Control to incrementally move partition leadership and replicated data from EC2 brokers to the Kubernetes brokers.
Partition leadership determines which broker is responsible for serving reads and writes for a given piece of data. Kafka stores copies of each partition on multiple brokers for redundancy. This is called the replication factor. Moving data meant reassigning both the leadership and the replicas to the new set of brokers, one partition at a time.
Reddit monitored this process continuously as the partition leadership on EC2 declined steadily over roughly a week while leadership on Strimzi climbed in parallel. Network traffic followed the same pattern. At every point, Reddit could pause or reverse the process if something looked wrong.
See the dashboard view below:

ZooKeeper had managed Kafka’s metadata throughout the entire broker migration. Reddit made a deliberate choice not to change the control plane until after the data plane was fully stable on Kubernetes. This separation of concerns reduced the risk of compounding failures.
Once all EC2 brokers were terminated and all data and traffic were running on Kubernetes, Reddit executed the migration from ZooKeeper to KRaft. KRaft is Kafka’s built-in metadata management system that eliminates the need for ZooKeeper.
See the diagram below:
Since Strimzi and Kafka both provide documented steps for this migration, and because the rest of the system had already settled, this final phase was comparatively straightforward.
After both the data plane and the control plane were fully running on Kubernetes, Reddit removed all the configuration overrides that the forked Strimzi operator had introduced.
Control of the clusters was handed off to the standard, unmodified Strimzi operator. The EC2 infrastructure was decommissioned.
Reddit’s migration is a good example of how large-scale infrastructure changes do not have to be dramatic, high-risk events. By breaking the work into small, reversible, well-understood steps and by respecting the constraints the system imposed, Reddit moved a petabyte-scale platform to Kubernetes without a single moment of downtime.
Some key lessons from Reddit’s migration journey were as follows:
Introducing a controllable abstraction layer between clients and infrastructure, whether that is DNS, a proxy, or an API gateway, is one of the highest-leverage changes you can make during a migration. It decouples the two sides and lets you change the infrastructure without forcing every team to update their code.
Metadata and logical state tend to outlive the physical machines they run on. When planning any large migration, treat the logical state as the thing you are protecting, and treat the infrastructure as something you are replacing around it.
Designing each step to be undoable is not just a safety measure. It changes how confidently and quickly you can move forward, because you know you can always step back if something goes wrong.
A migration that looks messy in the middle but never breaks production is far preferable to a clean design that requires a moment where things could go wrong with no recovery path.
References:
2026-03-16 23:31:14
npx workos launches an AI agent, powered by Claude, that reads your project, detects your framework, and writes a complete auth integration directly into your existing codebase. It’s not a template generator. It reads your code, understands your stack, and writes an integration that fits.
The WorkOS agent then typechecks and builds, feeding any errors back to itself to fix.
Every week, Stripe merges over 1,300 pull requests that contain zero human-written code. Not a single line. These PRs are produced by “Minions,” Stripe’s internal coding agents, which work completely unattended.
An engineer sends a message in Slack, walks away, and comes back to a finished pull request that has already passed automated tests and is ready for human review. The productivity boost scenario is quite compelling.
Here’s what it looks like:

Consider a Stripe engineer who is on-call when five small issues pile up overnight. Instead of working through them sequentially, they open Slack and fire off five messages, each tagging the Minions bot with a description of the fix. Then, they go to get coffee. By the time they come back, five agents have each spun up an isolated cloud machine in under ten seconds, read the relevant documentation, written code, run linters, pushed to CI, and prepared pull requests. The developer reviews them, approves three, sends feedback on one, and discards the last. In other words, five issues were handled in the time it would have taken to fix two manually.
However, the primary reason the Minions work has almost nothing to do with the AI model powering them. It has everything to do with the infrastructure that Stripe built for human engineers, years before LLMs existed. In this article, we will look at how Stripe managed to reach this level.
Disclaimer: This post is based on publicly shared details from the Stripe Engineering Team. Please comment if you notice any inaccuracies.
The AI coding tools you’ve probably encountered fall into a category called attended agents. Tools like Cursor and Claude Code work alongside you. Developers watch them, steer them when they drift, and approve each step.
See the diagram below that shows the typical view of an AI Agent:
Stripe’s engineers use these tools too. However, Minions are what’s known as unattended agents. No one is watching or steering them. The agent receives a task, works through it alone, and delivers a finished result. This distinction changes the design requirements for everything downstream.
Stripe’s codebase also makes this harder than it sounds. The codebase consists of hundreds of millions of lines of code, mostly written in Ruby with Sorbet typing, which is a relatively uncommon stack. The code is full of homegrown libraries that LLMs have never encountered in training data, and it moves well over $1 trillion per year in payment volume through production. The stakes are as extreme as the complexity.
Building a prototype from scratch is fundamentally different from contributing code to a codebase of this scale and maturity. So Stripe built Minions specifically for unattended work, and let third-party tools handle attended coding.
AI coding tools are fast, capable, and completely context-blind. Even with rules, skills, and MCP connections, they generate code that misses your conventions, ignores past decisions, and breaks patterns. You end up paying for that gap in rework and tokens.
Unblocked changes the economics.
It builds organizational context from your code, PR history, conversations, docs, and runtime signals. It maps relationships across systems, reconciles conflicting information, respects permissions, and surfaces what matters for the task at hand. Instead of guessing, agents operate with the same understanding as experienced engineers.
You can:
Generate plans, code, and reviews that reflect how your system actually works
Reduce costly retrieval loops and tool calls by providing better context up front
Spend less time correcting outputs for code that should have been right in the first place
Once Stripe decided to build custom, the first problem was about where to actually run these agents.
An unattended agent needs three properties from its environment:
It needs isolation, so mistakes can’t touch production.
It needs parallelism, so multiple agents can work simultaneously on separate tasks.
And it needs predictability, so every agent starts from a clean, consistent state.
Stripe already had all three. Their “devboxes” are cloud machines pre-loaded with the entire codebase, tools, and services. They spin up in ten seconds because Stripe proactively provisions and warms a pool of them, cloning repositories, warming caches, and starting background services ahead of time. Engineers already used one devbox per task, and a single engineer might have half a dozen running at once. Agents slot into this same pattern.
Since devboxes run in a QA environment, they are already isolated from production data, real user information, and arbitrary network access. That means agents can run with full permissions and no confirmation prompts. The blast radius of any mistake is contained to one disposable machine.
The important thing to understand is that Stripe didn’t build this for agents. They built it for humans. Parallelism, predictability, and isolation were desirable properties for engineers long before LLMs entered the picture. In other words, what’s good for humans is good for agents as well.
A good environment gives the agent a place to work. But it doesn’t tell the agent how to work.
There are two common ways to orchestrate an LLM system:
A workflow is a fixed graph of steps where each step does one narrow thing, and the sequence is predetermined.
An agent is a loop where the LLM decides what to do next based on the results of its previous actions.
Workflows are predictable but rigid. Agents are flexible but unreliable.
Stripe built something in between that they call “blueprints.” A blueprint is a sequence of nodes where some nodes run deterministic code, and other nodes run an agentic loop. Think of it as a structure that alternates between rigid steps and creative steps. For example, the “implement the feature” step or “fix CI failures” step gets the full agentic loop with tools and freedom. On the other hand, the “run linters” step is hardcoded. The “push the branch” step is hardcoded.
This separation matters because some tasks should never be left to the agent’s judgment. You always want linters to run. You always want the branch pushed in a specific way that follows the company’s PR template. Making these deterministic saves tokens, reduces errors, and guarantees that critical steps happen every single time. Across hundreds of runs per day, each deterministic node is one less thing that can go wrong, and that compounds into big reliability gains.
Blueprints tell the agent how to work. But the agent still needs to know what it’s working with. In a codebase of hundreds of millions of lines, getting the right information into the agent’s limited context window is an engineering challenge.
LLMs can only hold so much text at once. If you try to load every coding rule and convention globally, the agent’s context fills up before it even starts working. Stripe uses global rules “very judiciously” for exactly this reason. Instead, they scope rules to specific subdirectories and file patterns. As the agent moves through the filesystem, it automatically picks up only the rules relevant to where it’s working. These are the same rule files that human-directed tools like Cursor and Claude Code read, so there is no duplication and no agent-specific overhead.
For information that doesn’t live in the filesystem, Stripe built a centralized internal server called Toolshed. It hosts nearly 500 tools using MCP, which stands for Model Context Protocol and is essentially an industry standard that gives agents a uniform way to call external services. Through MCP, agents can fetch internal documentation, ticket details, build statuses, code search results, and more.
But more tools aren’t better. Agents perform best with a carefully curated subset relevant to their task. Stripe gives Minions a small default set and lets engineers add more when needed.
The agent now has an environment, a structure, and the right context. However, the code still had to be correct, which meant more feedback loops.
Stripe’s feedback architecture works in layers:
First, local linting runs on every push in under five seconds. A background daemon precomputes which lint rules apply and caches the results, so this step is nearly instantaneous.
Second, CI selectively runs tests from Stripe’s battery of over three million tests, and autofixes are applied automatically for known failure patterns.
Third, if failures remain without an autofix, the agent gets one more chance to fix and push again.
Then it stops. At most two rounds of CI. If the code doesn’t pass after the second push, the branch goes back to the human engineer.
This cap is intentional. LLMs show diminishing returns when retrying the same problem repeatedly. More rounds cost more tokens and compute without proportional improvement. Knowing when to stop is as important as knowing how to start.
When a minion run doesn’t fully succeed, it’s still often a useful starting point. A partially correct PR that an engineer can polish in twenty minutes is still a significant win. Stripe is designed for this reality rather than pretending every run would be perfect.
Four layers make Stripe’s Minions work:
Isolated environments that give agents safe, parallel workspaces.
Hybrid orchestration that mixes deterministic guardrails with agentic flexibility.
Curated context that feeds agents the right information without overwhelming them.
And fast feedback loops with hard limits on iteration.
Each layer is necessary, and none alone is sufficient.
The primary insight in Stripe’s approach is that investments in developer productivity over the years can provide unexpected dividends when agents are included in the workflow. Human review didn’t disappear either, but shifted. Engineers moved from writing code to reviewing code.
A key lesson while thinking about deploying coding agents is not to start with model selection. Start with your developer environment, your test infrastructure, and your feedback loops. If those are solid, agents will benefit from them. If they’re not, no model will save you. Stripe’s experience suggests the answer is less about AI breakthroughs and more about the engineering fundamentals that were always supposed to matter.
References:
2026-03-14 23:30:58
On-call shouldn’t feel like constant firefighting. This guide from Datadog breaks down how high-performing SRE teams reduce alert fatigue, streamline incident response, and design rotations that don’t burn engineers out.
You’ll learn how to:
Cut alert noise by tying signals to real user impact
Improve response with clear roles and smarter escalation paths
Turn incidents into feedback loops that improve system reliability
This week’s system design refresher:
What’s Next in AI: 5 Trends to Watch in 2026 (Youtube video)
Git Workflow: Essential Commands
How can Cache Systems go wrong?
Top Cyber Attacks Explained
How AI Actually Generates Images
Git has a lot of commands. Most workflows use a fraction of them. The part that causes problems isn't the commands themselves, it's not knowing where your code sits after running one.
Working directory, staging area, local repo, remote repo. Each command moves code between these. Here's what each one does.
Saving Your Work: “git add” moves files from your working directory to the staging area. “git commit” saves those staged files to your local repository. “git push” uploads your commits to the remote repository
Getting a Project: “git clone” pulls down the entire remote repository to your machine. “git checkout” switches you to a specific branch.
Syncing Changes: “git fetch” downloads updates from remote but doesn't change your files. “git merge” integrates those changes. “git pull” does both at once.
The Safety Net: “git stash” is your undo button. It temporarily saves your uncommitted changes so you can switch contexts without losing work. “git stash apply” brings them back. “git stash pop” brings them back and deletes the stash.
Building web scrapers for RAG pipelines or model training usually means managing fragile fleets of headless browsers and complex scraping logic. Cloudflare’s new Browser Rendering endpoint changes that. You can now crawl an entire website asynchronously with a single API call. Submit a starting URL, and the endpoint automatically discovers pages, renders them, and returns clean HTML, Markdown, or structured JSON. It fully respects robots.txt out of the box, supports incremental crawling to reduce costs, and includes a fast static mode. Stop managing scraping infrastructure and get back to building your application.
The diagram below shows 4 typical cases where caches can go wrong and their solutions.
Thunder herd problem
This happens when a large number of keys in the cache expire at the same time. Then the query requests directly hit the database, which overloads the database.
There are two ways to mitigate this issue: one is to avoid setting the same expiry time for the keys, adding a random number in the configuration; the other is to allow only the core business data to hit the database and prevent non-core data to access the database until the cache is back up.
Cache penetration
This happens when the key doesn’t exist in the cache or the database. The application cannot retrieve relevant data from the database to update the cache. This problem creates a lot of pressure on both the cache and the database.
To solve this, there are two suggestions. One is to cache a null value for non-existent keys, avoiding hitting the database. The other is to use a bloom filter to check the key existence first, and if the key doesn’t exist, we can avoid hitting the database.
Cache breakdown
This is similar to the thunder herd problem. It happens when a hot key expires. A large number of requests hit the database.
Since the hot keys take up 80% of the queries, we do not set an expiration time for them.
Cache crash
This happens when the cache is down and all the requests go to the database.
There are two ways to solve this problem. One is to set up a circuit breaker, and when the cache is down, the application services cannot visit the cache or the database. The other is to set up a cluster for the cache to improve cache availability.
Over to you: Have you met any of these issues in production?
Most attacks follow a sequence of steps. Understanding each step makes it easier to spot where detection or prevention is possible.
Here’s a quick breakdown of how the most common attacks unfold:
Phishing: The attacker sends a fake link pointing to a spoofed login page. The victim enters credentials, the attacker captures them, and uses them to access the real system.
Ransomware: The victim opens a malicious attachment or file. The ransomware encrypts local data and demands payment to restore access. Files stay locked until the ransom is paid or a backup is restored.
Man-in-the-Middle (MitM): The attacker positions themselves between the victim and the server, intercepting traffic in both directions. Neither side detects the interception. The attacker can read or modify data as it passes through.
SQL Injection: Malicious SQL gets inserted into an input field, for example, studentId=117 OR 1=1. The database executes it as a valid query and returns data it shouldn't. A single vulnerable input field can expose an entire table.
Cross-Site Scripting (XSS): A malicious script gets injected into a legitimate page. When another user loads that page, their browser executes the script. Session tokens, cookies, and private data can be stolen this way.
Zero-Day Exploits: The attacker finds a vulnerability the vendor hasn't discovered yet. No patch exists. The attack runs until the vendor identifies the issue and ships a fix, which can take days or weeks.
Over to you: Which of these attacks have you seen most often in real environments, and which one do you think is the hardest to defend against today?
There are two main ways modern models generate images: auto-regressive and diffusion.
Auto-regressive models generate an image piece by piece.
During training, an image is split into tokens, and the model learns to predict them one by one, just like text. It minimizes next-token prediction loss over image tokens.
At inference time, the model predicts one image token at a time until the full image is formed.
Diffusion models start from pure noise and iteratively denoise it.
During training, we add noise to real images and train the model to predict that noise.
At inference time, the model starts from random noise and iteratively denoises it into a clean image.
Auto-regressive is like drawing a dog stroke by stroke in sequence. Diffusion is like starting with a rough sketch (coarse shapes), then progressively adding detail and cleaning up the picture.
Over to you: Which text-to-image model do you find most powerful?
2026-03-12 23:31:11
When we hear “stateless architecture”, we often think it means building applications that have no state. That’s the wrong picture, and it can lead to confusion about everything that follows.
Every application has a state, such as user sessions, shopping carts, authentication tokens, and preferences. All of that is state. It’s the application’s memory and the very thing that makes personalized digital experiences possible. Without it, every visit to a website would feel like the first time.
In other words, stateless architecture doesn’t eliminate state but relocates it. Understanding where the state moves, why we move it, and what that move costs us is essential for developers.
In this article, we will understand the nuances of stateless architecture in more detail.