MoreRSS

site iconByteByteGoModify

System design and interviewing experts, authors of best-selling books, offer newsletters and courses.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of ByteByteGo

How Stripe’s Minions Ship 1,300 PRs a Week

2026-03-16 23:31:14

npx workos: An AI Agent That Writes Auth Directly Into Your Codebase (Sponsored)

npx workos launches an AI agent, powered by Claude, that reads your project, detects your framework, and writes a complete auth integration directly into your existing codebase. It’s not a template generator. It reads your code, understands your stack, and writes an integration that fits.

The WorkOS agent then typechecks and builds, feeding any errors back to itself to fix.

See how it works →


Every week, Stripe merges over 1,300 pull requests that contain zero human-written code. Not a single line. These PRs are produced by “Minions,” Stripe’s internal coding agents, which work completely unattended.

An engineer sends a message in Slack, walks away, and comes back to a finished pull request that has already passed automated tests and is ready for human review. The productivity boost scenario is quite compelling.

Here’s what it looks like:

Consider a Stripe engineer who is on-call when five small issues pile up overnight. Instead of working through them sequentially, they open Slack and fire off five messages, each tagging the Minions bot with a description of the fix. Then, they go to get coffee. By the time they come back, five agents have each spun up an isolated cloud machine in under ten seconds, read the relevant documentation, written code, run linters, pushed to CI, and prepared pull requests. The developer reviews them, approves three, sends feedback on one, and discards the last. In other words, five issues were handled in the time it would have taken to fix two manually.

However, the primary reason the Minions work has almost nothing to do with the AI model powering them. It has everything to do with the infrastructure that Stripe built for human engineers, years before LLMs existed. In this article, we will look at how Stripe managed to reach this level.

Disclaimer: This post is based on publicly shared details from the Stripe Engineering Team. Please comment if you notice any inaccuracies.

Why Off-the-Shelf Agents Weren’t Enough

The AI coding tools you’ve probably encountered fall into a category called attended agents. Tools like Cursor and Claude Code work alongside you. Developers watch them, steer them when they drift, and approve each step.

See the diagram below that shows the typical view of an AI Agent:

Stripe’s engineers use these tools too. However, Minions are what’s known as unattended agents. No one is watching or steering them. The agent receives a task, works through it alone, and delivers a finished result. This distinction changes the design requirements for everything downstream.

Stripe’s codebase also makes this harder than it sounds. The codebase consists of hundreds of millions of lines of code, mostly written in Ruby with Sorbet typing, which is a relatively uncommon stack. The code is full of homegrown libraries that LLMs have never encountered in training data, and it moves well over $1 trillion per year in payment volume through production. The stakes are as extreme as the complexity.

Building a prototype from scratch is fundamentally different from contributing code to a codebase of this scale and maturity. So Stripe built Minions specifically for unattended work, and let third-party tools handle attended coding.


Unblocked: Context that saves you time and tokens (Sponsored)

AI coding tools are fast, capable, and completely context-blind. Even with rules, skills, and MCP connections, they generate code that misses your conventions, ignores past decisions, and breaks patterns. You end up paying for that gap in rework and tokens.

Unblocked changes the economics.

It builds organizational context from your code, PR history, conversations, docs, and runtime signals. It maps relationships across systems, reconciles conflicting information, respects permissions, and surfaces what matters for the task at hand. Instead of guessing, agents operate with the same understanding as experienced engineers.

You can:

  • Generate plans, code, and reviews that reflect how your system actually works

  • Reduce costly retrieval loops and tool calls by providing better context up front

  • Spend less time correcting outputs for code that should have been right in the first place

See how it works


The Environment to Run Agents

Once Stripe decided to build custom, the first problem was about where to actually run these agents.

An unattended agent needs three properties from its environment:

  • It needs isolation, so mistakes can’t touch production.

  • It needs parallelism, so multiple agents can work simultaneously on separate tasks.

  • And it needs predictability, so every agent starts from a clean, consistent state.

Stripe already had all three. Their “devboxes” are cloud machines pre-loaded with the entire codebase, tools, and services. They spin up in ten seconds because Stripe proactively provisions and warms a pool of them, cloning repositories, warming caches, and starting background services ahead of time. Engineers already used one devbox per task, and a single engineer might have half a dozen running at once. Agents slot into this same pattern.

Since devboxes run in a QA environment, they are already isolated from production data, real user information, and arbitrary network access. That means agents can run with full permissions and no confirmation prompts. The blast radius of any mistake is contained to one disposable machine.

The important thing to understand is that Stripe didn’t build this for agents. They built it for humans. Parallelism, predictability, and isolation were desirable properties for engineers long before LLMs entered the picture. In other words, what’s good for humans is good for agents as well.

Agent Don’t Freestyle

A good environment gives the agent a place to work. But it doesn’t tell the agent how to work.

There are two common ways to orchestrate an LLM system:

  • A workflow is a fixed graph of steps where each step does one narrow thing, and the sequence is predetermined.

  • An agent is a loop where the LLM decides what to do next based on the results of its previous actions.

Workflows are predictable but rigid. Agents are flexible but unreliable.

Stripe built something in between that they call “blueprints.” A blueprint is a sequence of nodes where some nodes run deterministic code, and other nodes run an agentic loop. Think of it as a structure that alternates between rigid steps and creative steps. For example, the “implement the feature” step or “fix CI failures” step gets the full agentic loop with tools and freedom. On the other hand, the “run linters” step is hardcoded. The “push the branch” step is hardcoded.

This separation matters because some tasks should never be left to the agent’s judgment. You always want linters to run. You always want the branch pushed in a specific way that follows the company’s PR template. Making these deterministic saves tokens, reduces errors, and guarantees that critical steps happen every single time. Across hundreds of runs per day, each deterministic node is one less thing that can go wrong, and that compounds into big reliability gains.

The Right Context

Blueprints tell the agent how to work. But the agent still needs to know what it’s working with. In a codebase of hundreds of millions of lines, getting the right information into the agent’s limited context window is an engineering challenge.

LLMs can only hold so much text at once. If you try to load every coding rule and convention globally, the agent’s context fills up before it even starts working. Stripe uses global rules “very judiciously” for exactly this reason. Instead, they scope rules to specific subdirectories and file patterns. As the agent moves through the filesystem, it automatically picks up only the rules relevant to where it’s working. These are the same rule files that human-directed tools like Cursor and Claude Code read, so there is no duplication and no agent-specific overhead.

For information that doesn’t live in the filesystem, Stripe built a centralized internal server called Toolshed. It hosts nearly 500 tools using MCP, which stands for Model Context Protocol and is essentially an industry standard that gives agents a uniform way to call external services. Through MCP, agents can fetch internal documentation, ticket details, build statuses, code search results, and more.

But more tools aren’t better. Agents perform best with a carefully curated subset relevant to their task. Stripe gives Minions a small default set and lets engineers add more when needed.

Fast Feedback, Hard Limits

The agent now has an environment, a structure, and the right context. However, the code still had to be correct, which meant more feedback loops.

Stripe’s feedback architecture works in layers:

  • First, local linting runs on every push in under five seconds. A background daemon precomputes which lint rules apply and caches the results, so this step is nearly instantaneous.

  • Second, CI selectively runs tests from Stripe’s battery of over three million tests, and autofixes are applied automatically for known failure patterns.

  • Third, if failures remain without an autofix, the agent gets one more chance to fix and push again.

Then it stops. At most two rounds of CI. If the code doesn’t pass after the second push, the branch goes back to the human engineer.

This cap is intentional. LLMs show diminishing returns when retrying the same problem repeatedly. More rounds cost more tokens and compute without proportional improvement. Knowing when to stop is as important as knowing how to start.

When a minion run doesn’t fully succeed, it’s still often a useful starting point. A partially correct PR that an engineer can polish in twenty minutes is still a significant win. Stripe is designed for this reality rather than pretending every run would be perfect.

Conclusion

Four layers make Stripe’s Minions work:

  • Isolated environments that give agents safe, parallel workspaces.

  • Hybrid orchestration that mixes deterministic guardrails with agentic flexibility.

  • Curated context that feeds agents the right information without overwhelming them.

  • And fast feedback loops with hard limits on iteration.

Each layer is necessary, and none alone is sufficient.

The primary insight in Stripe’s approach is that investments in developer productivity over the years can provide unexpected dividends when agents are included in the workflow. Human review didn’t disappear either, but shifted. Engineers moved from writing code to reviewing code.

A key lesson while thinking about deploying coding agents is not to start with model selection. Start with your developer environment, your test infrastructure, and your feedback loops. If those are solid, agents will benefit from them. If they’re not, no model will save you. Stripe’s experience suggests the answer is less about AI breakthroughs and more about the engineering fundamentals that were always supposed to matter.

References:

EP206: Git Workflow: Essential Commands

2026-03-14 23:30:58

On-call Best Practices for SREs (Sponsored)

On-call shouldn’t feel like constant firefighting. This guide from Datadog breaks down how high-performing SRE teams reduce alert fatigue, streamline incident response, and design rotations that don’t burn engineers out.

You’ll learn how to:

  • Cut alert noise by tying signals to real user impact

  • Improve response with clear roles and smarter escalation paths

  • Turn incidents into feedback loops that improve system reliability

Get the guide


This week’s system design refresher:

  • What’s Next in AI: 5 Trends to Watch in 2026 (Youtube video)

  • Git Workflow: Essential Commands

  • How can Cache Systems go wrong?

  • Top Cyber Attacks Explained

  • How AI Actually Generates Images


What’s Next in AI: 5 Trends to Watch in 2026


Git Workflow: Essential Commands

Git has a lot of commands. Most workflows use a fraction of them. The part that causes problems isn't the commands themselves, it's not knowing where your code sits after running one.

Working directory, staging area, local repo, remote repo. Each command moves code between these. Here's what each one does.

  • Saving Your Work: “git add” moves files from your working directory to the staging area. “git commit” saves those staged files to your local repository. “git push” uploads your commits to the remote repository

  • Getting a Project: “git clone” pulls down the entire remote repository to your machine. “git checkout” switches you to a specific branch.

  • Syncing Changes: “git fetch” downloads updates from remote but doesn't change your files. “git merge” integrates those changes. “git pull” does both at once.

  • The Safety Net: “git stash” is your undo button. It temporarily saves your uncommitted changes so you can switch contexts without losing work. “git stash apply” brings them back. “git stash pop” brings them back and deletes the stash.


Crawl an Entire Website With a Single API Call (Sponsored)

Building web scrapers for RAG pipelines or model training usually means managing fragile fleets of headless browsers and complex scraping logic. Cloudflare’s new Browser Rendering endpoint changes that. You can now crawl an entire website asynchronously with a single API call. Submit a starting URL, and the endpoint automatically discovers pages, renders them, and returns clean HTML, Markdown, or structured JSON. It fully respects robots.txt out of the box, supports incremental crawling to reduce costs, and includes a fast static mode. Stop managing scraping infrastructure and get back to building your application.

Try the Crawl API today


How can Cache Systems go wrong?

The diagram below shows 4 typical cases where caches can go wrong and their solutions.

  1. Thunder herd problem
    This happens when a large number of keys in the cache expire at the same time. Then the query requests directly hit the database, which overloads the database.

    There are two ways to mitigate this issue: one is to avoid setting the same expiry time for the keys, adding a random number in the configuration; the other is to allow only the core business data to hit the database and prevent non-core data to access the database until the cache is back up.

  2. Cache penetration
    This happens when the key doesn’t exist in the cache or the database. The application cannot retrieve relevant data from the database to update the cache. This problem creates a lot of pressure on both the cache and the database.

    To solve this, there are two suggestions. One is to cache a null value for non-existent keys, avoiding hitting the database. The other is to use a bloom filter to check the key existence first, and if the key doesn’t exist, we can avoid hitting the database.

  3. Cache breakdown
    This is similar to the thunder herd problem. It happens when a hot key expires. A large number of requests hit the database.

    Since the hot keys take up 80% of the queries, we do not set an expiration time for them.

  4. Cache crash
    This happens when the cache is down and all the requests go to the database.

    There are two ways to solve this problem. One is to set up a circuit breaker, and when the cache is down, the application services cannot visit the cache or the database. The other is to set up a cluster for the cache to improve cache availability.

Over to you: Have you met any of these issues in production?


Top Cyber Attacks Explained

Most attacks follow a sequence of steps. Understanding each step makes it easier to spot where detection or prevention is possible.

Here’s a quick breakdown of how the most common attacks unfold:

Phishing: The attacker sends a fake link pointing to a spoofed login page. The victim enters credentials, the attacker captures them, and uses them to access the real system.

Ransomware: The victim opens a malicious attachment or file. The ransomware encrypts local data and demands payment to restore access. Files stay locked until the ransom is paid or a backup is restored.

Man-in-the-Middle (MitM): The attacker positions themselves between the victim and the server, intercepting traffic in both directions. Neither side detects the interception. The attacker can read or modify data as it passes through.

SQL Injection: Malicious SQL gets inserted into an input field, for example, studentId=117 OR 1=1. The database executes it as a valid query and returns data it shouldn't. A single vulnerable input field can expose an entire table.

Cross-Site Scripting (XSS): A malicious script gets injected into a legitimate page. When another user loads that page, their browser executes the script. Session tokens, cookies, and private data can be stolen this way.

Zero-Day Exploits: The attacker finds a vulnerability the vendor hasn't discovered yet. No patch exists. The attack runs until the vendor identifies the issue and ships a fix, which can take days or weeks.

Over to you: Which of these attacks have you seen most often in real environments, and which one do you think is the hardest to defend against today?


How AI Actually Generates Images

There are two main ways modern models generate images: auto-regressive and diffusion.

Auto-regressive models generate an image piece by piece.

During training, an image is split into tokens, and the model learns to predict them one by one, just like text. It minimizes next-token prediction loss over image tokens.

At inference time, the model predicts one image token at a time until the full image is formed.

Diffusion models start from pure noise and iteratively denoise it.

During training, we add noise to real images and train the model to predict that noise.

At inference time, the model starts from random noise and iteratively denoises it into a clean image.

Auto-regressive is like drawing a dog stroke by stroke in sequence. Diffusion is like starting with a rough sketch (coarse shapes), then progressively adding detail and cleaning up the picture.

Over to you: Which text-to-image model do you find most powerful?

Stateless Architecture: Benefits and Tradeoffs

2026-03-12 23:31:11

When we hear “stateless architecture”, we often think it means building applications that have no state. That’s the wrong picture, and it can lead to confusion about everything that follows.

Every application has a state, such as user sessions, shopping carts, authentication tokens, and preferences. All of that is state. It’s the application’s memory and the very thing that makes personalized digital experiences possible. Without it, every visit to a website would feel like the first time.

In other words, stateless architecture doesn’t eliminate state but relocates it. Understanding where the state moves, why we move it, and what that move costs us is essential for developers.

In this article, we will understand the nuances of stateless architecture in more detail.

The Problem with Keeping State on the Server

Read more

How Vimeo Implemented AI-Powered Subtitles

2026-03-11 23:31:15

On-Demand Webinar: Designing for Failure and Speed in Agentic Workflows with FeatureOps (Sponsored)

Join Alex Casalboni (Developer Advocate @ Unleash) for a deep dive on how to design resilient AI workflows to make reversibility a foundational mechanism and release AI-generated code with confidence.

AI writes code in seconds, but reviews take hours. Don’t let this gap slow you down.

Watch our recent webinar to learn how FeatureOps helps you manage risk, contain blast radius, and maintain control over fast-moving agentic workflows.

In this webinar, you’ll learn how to:

  • Reduce blast radius for AI-generated changes

  • Separate deployment from exposure at runtime

  • Build reversibility into agent planning and shipping

Watch Now


Imagine you’re watching a video with AI-generated subtitles. The speaker is mid-sentence, clearly still talking, gesturing, making a point. But the subtitles just vanish, and there are a few seconds of blank screen. Then they reappear as if nothing happened.

This looks like a bug. But it’s a side effect of the AI being too good at translation.

Vimeo’s engineering team ran into this exact problem when they built LLM-powered subtitle translation for their platform. The translations themselves were excellent: fluent, natural, and often indistinguishable from human work. However, the product experience was broken because subtitles kept disappearing mid-playback, and the root cause turned out to be the AI’s own competence.

In this article, we will look at how the Vimeo engineering team overcame this problem and the decisions it made

Disclaimer: This post is based on publicly shared details from the Vimeo Engineering Team. Please comment if you notice any inaccuracies.

Subtitles Are a Timing Grid

A subtitle file is a sequence of timed slots. Each slot has a start time, an end time, and a piece of text. The video player reads these slots and displays text during each window. Outside that window, nothing shows. If a slot is empty, the screen goes blank for that duration.

This means subtitle translation carries an implicit contract that must be followed. If the source language has four lines, the translation also needs to produce exactly four lines. Each translated line maps to the same time slot as the original. Breaking this contract results in empty slots.

LLMs break this contract by default because they’re optimized for fluency. When an LLM encounters messy, but natural human speech (filler words, false starts, repeated phrases), it does what a good translator would do. It cleans things up and merges fragmented thoughts into a single, polished sentence.

Here’s a concrete example. A speaker in a video says:

“Um, you know, I think that we’re gonna get... we’re gonna remove a lot of barriers.”

That maps to two timed subtitle slots on the video timeline. A traditional translation system handles each line separately, one-to-one. But the LLM recognizes this as a single, fragmented thought and produces one clean Japanese sentence, which is grammatically perfect and semantically accurate. But now the system has two time slots and only one line of text. The second slot goes blank, which means that the subtitles disappear while the speaker keeps talking.

Vimeo calls this the blank screen bug. And it isn’t a rare edge case. It’s the default behavior of any sufficiently capable language model translating messy human speech.

See the picture below:

If you’ve ever built anything that sends LLM output into a system expecting predictable structure (JSON schemas, form fields, database rows), you’ve probably hit a version of this same tension. The model optimizes for quality, and quality doesn’t always respect the structural contract your system depends on.

The Geometry of Language

This problem gets significantly worse when you move beyond European languages.

Different languages don’t just use different words. They organize thoughts in fundamentally different orders and densities. Vimeo’s engineering team started calling this “the geometry of language,” and it essentially signifies that the shape of a sentence changes across languages in ways that make one-to-one line mapping structurally impossible in some cases.

For example, Japanese is far more information-dense than English. Where an English speaker might speak four lines of filler (”Um, so basically,” / “what we’re trying to do” / “is, you know,” / “remove the barriers”), a typical Japanese translation consolidates all of that into a single, grammatically tight sentence.

See the example below:

The LLM is doing the right thing linguistically. Four lines of English filler genuinely are one thought in Japanese. But the subtitle system now has four time slots and enough text for one. Three slots go blank while the speaker keeps talking.

The German language has a different problem. German places verbs at the end of clauses, creating what linguists call a “verb bracket.” If the subtitle system tries to split a German sentence at a line boundary, the first subtitle hangs grammatically incomplete, missing its verb. The LLM resists producing this because it looks like a syntax error.

Each of these is a structurally different failure mode. The LLM is succeeding at translation while failing at structure. These are two fundamentally different jobs being crammed into a single prompt, and that realization led Vimeo to rethink its architecture.

The Split-Brain Fix

Vimeo tried the obvious approach first.

One LLM prompt that translates the text and preserves the line count. In their words, it was “a losing battle.”

The creative requirement (fluency) was constantly fighting the structural requirement (timing). Asking the model to produce natural-sounding German while also splitting it at exact line boundaries means optimizing for two competing goals at once.

Even research backs this up. A 2024 study by Tam et al. found that imposing format constraints on LLMs measurably degrades their reasoning quality. Stricter constraints often mean worse performance. In other words, you’re not just asking the model to do two things. You’re making it worse at both.

So Vimeo stopped trying to do it all in one pass. They split the pipeline into three phases.

Phase 1: Smart Chunking

Before any translation happens, the system groups source lines into logical thought blocks of roughly 3-5 lines.

If we feed the LLM a single isolated line like “arming the partners,” it has zero context. It doesn’t know who is being armed or why. Feed it the entire transcript, and it loses track of where it is and starts hallucinating, meaning it generates plausible-sounding content that wasn’t in the original. The chunking algorithm scans for sentence boundaries and groups text so the LLM always sees a complete thought before translating.

Phase 2: Creative Translation.

Each chunk goes to the LLM with one instruction: translate for meaning. There is no line count enforcement and no structural constraints. The model is free to handle German verb brackets naturally, reorder Hindi syntax correctly, and compress Japanese efficiently. Linguistic quality is the only goal.

Phase 3: Line Mapping.

The fluent translated block goes into a second, separate LLM call with a completely different job. This call is purely structural.

The prompt essentially says: “Here are the original four English lines with timestamps. Here is the translated block. Break it back into four lines that match the source rhythm.” There is no concern for meaning, but only line count.

By separating these concerns, each pass gets to do its job without compromise. Phase 2 ensures the translation is grammatically sound. Phase 3 ensures the timing is respected. And on the first pass through this pipeline, roughly 95% of chunks map perfectly.

Designing for the Five Percent

Ninety-five percent is impressive. But Vimeo ships to millions of viewers across nine languages. That other five percent also matters a lot.

This is where Vimeo’s engineering philosophy gets interesting. They stopped asking “how do we make the LLM get it right the first time?” and started asking “what happens when it doesn’t?” That reframe shaped everything about how they built the production system.

When the line mapper returns a mismatch (say, one line of German when it asked for two), the system doesn’t give up. It enters a correction loop, retrying with explicit feedback about the error. The prompt tells the model what went wrong and asks it to try again. Often, the model finds a valid synonym or slightly less natural phrasing that respects the line count. This correction loop resolves about a third of failures.

If that doesn’t work, the system escalates to a simpler LLM prompt. It strips away all the semantic instructions and gives the model a bare-bones task: “Here’s one block of text. Split it into exactly N lines.”

And if the LLM still can’t produce the right count, Vimeo stops asking models entirely. A rule-based algorithm takes over. Empty lines get filled with the last valid content. Too few lines get padded by duplicating text. Too many lines get truncated. The output in these edge cases is functional rather than perfect. You might see the same phrase repeated across a couple of subtitle slots. But every time slot gets filled, and the viewer never sees a blank screen.

This graduated fallback is what separates a production system from a demo. Vimeo’s data showed that the correction loop resolves about 32% of the failures that make it past the first pass. The rule-based splitter catches everything else. The result is 100% of chunks reach the user in a valid state.

Conclusion

This architecture works. But it carries real costs.

The multi-pass approach adds roughly 4-8% more processing time and 6-10% more token cost compared to a single-call translation. Vimeo argues this tradeoff pays for itself by eliminating around 20 hours of manual QA per 1,000 videos. At their scale, the math works clearly. At a smaller scale, it might not.

There’s also an uncomfortable quality gap across languages. The system ensures every language gets functional subtitles, but speakers of structurally different languages hit the fallback chain far more often than Spanish or Italian speakers. That means more repeated phrases, more slightly awkward line splits. The system guarantees no blank screens, but it doesn’t guarantee equal quality.

And there’s a genuine product question embedded in the rule-based fallback. When the system repeats a phrase across two subtitle slots to avoid a blank screen, is that actually a better viewer experience? Vimeo decided it is. That’s a reasonable product call, but it’s worth recognizing as a decision, not a technical inevitability.

The broader value of Vimeo’s experience reaches well beyond video subtitles. If you’re building AI into any product with structural constraints, their journey points to three principles that can be understood:

  • First, separate the creative work from the structural work. Asking one LLM call to be both brilliant and obedient means optimizing for competing goals, and research suggests that makes it worse at both.

  • Second, build your fallback chain before you build your happy path. The question isn’t how to prevent failures. It’s what your system does when they happen.

  • Third, accept that smarter models need smarter infrastructure around them. A simple word-for-word translator would never break subtitle sync. Intelligence is what creates the engineering challenge. Vimeo calls this the “infrastructure tax of intelligence,” and it’s a cost worth understanding before you build.

References:

How Airbnb Rolled Out 20+ Local Payment Methods in 360 Days

2026-03-10 23:31:17

It’s not just about getting the prompt right. (Sponsored)

When trying to spin up AI agents, companies often get stuck in the prompting weeds and end up with agents that don’t deliver dependable results. This ebook from You.com goes beyond the prompt, revealing five stages for building a successful AI agent and why most organizations haven’t gotten there yet.

In this guide, you’ll learn:

  • Why prompts alone aren’t enough and how context and metadata unlock reliable agent automation

  • Four essential ways to calculate ROI, plus when and how to use each metric

If you’re ready to go beyond the prompt, this is the ebook for you.

Get the Guide


For years, Airbnb supported credit and debit cards as the primary way guests could pay for accommodations.

However, today Airbnb operates in over 220 countries worldwide, and while cards work well in many regions, just relying on this payment approach excludes millions of potential users. In countries where credit card penetration is low or where people strongly prefer local payment methods, Airbnb was losing bookings and limiting its growth potential.

To solve this problem, the Airbnb Engineering Team launched the “Pay as a Local” initiative. The goal was to integrate 20+ locally preferred payment methods across multiple markets in just 14 months.

In this article, we will look at the technical architecture and engineering decisions that made this expansion possible.

Disclaimer: This post is based on publicly shared details from the Airbnb Engineering Team. Please comment if you notice any inaccuracies.

Understanding Local Payment Methods

Local Payment Methods, or LPMs, extend beyond traditional payment cards. They include digital wallets like Naver Pay in South Korea and M-Pesa in Kenya, online bank transfers used across Europe, instant payment systems like PIX in Brazil and UPI in India, and regional payment schemes like EFTPOS and Cartes Bancaires.

Supporting LPMs provides several advantages.

  • First, offering familiar payment options increases conversion rates at checkout.

  • Second, it unlocks markets where credit card usage is minimal or nonexistent.

  • Third, it improves accessibility for guests who lack credit cards or traditional banking access.

The Airbnb team identified over 300 unique payment methods worldwide through initial research. For the first phase, they used a qualification framework to narrow this list. They evaluated the top 75 travel markets, selected the top one or two payment methods per market, excluded methods without clear travel use cases, and arrived at a shortlist of just over 20 LPMs suited for integration.

Modernizing the Payment Platform

Before building support for LPMs, Airbnb needed to modernize its payment infrastructure.

The original system was monolithic, meaning all payment logic existed in one large codebase. This architecture created several problems:

  • Adding new features took considerable time, and time to market for new capabilities was measured in months.

  • Different teams couldn’t work independently.

  • The system was difficult to scale.

Airbnb implemented a multi-year replatforming initiative called Payments LTA, where LTA stands for Long-Term Architecture. The team shifted from a monolithic system to a capability-oriented services system structured by domains. This approach uses domain-driven decomposition, where the system is broken into smaller services based on business capabilities.

See the diagram below that shows a sample domain-driven approach:

After the entire exercise, the core payment domain at Airbnb consisted of multiple subdomains:

  • The Pay-in subdomain handles guest payments.

  • The Payout subdomain manages host payments.

  • Transaction Fulfillment oversees the complete transaction lifecycle.

  • The Processing subdomain integrates with third-party payment service providers.

  • Wallet and Instruments for stored payment methods.

  • Ledger for recording transactions.

  • Incentives and Stored Value for credits and coupons.

  • Issuing for creating payment instruments.

  • Settlement and Reconciliation for ensuring accurate money flows.

This modernization approach reduced time to market for new features, increased code reusability and extensibility, and empowered greater team autonomy by allowing teams to work on specific domains independently.


Using AI agents to debug production incidents from AI-generated code (Sponsored)

With AI generating ~80% of code, production systems are getting harder to debug with every deployment. Investigating issues requires following a trail of breadcrumbs across code, infrastructure, telemetry, and documents.

Teams are keeping up with production at scale using AI agents that investigate like senior engineers, forming hypotheses, running queries across systems, and converging on root cause with evidence.

Engineering teams at Coinbase and DoorDash are using Resolve AI to cut incident investigation time by 70%+.

Explore how AI agents help investigate issues in this interactive demo.

Click through a real investigation →


Building the Multi-Step Transaction Framework

The Processing subdomain became particularly important for LPM integration.

Airbnb adopted a connector and plugin-based architecture for onboarding new payment service providers, or PSPs.

During replatforming, the team introduced Multi-Step Transactions, abbreviated as MST. This processor-agnostic framework supports payment flows completed across multiple stages.

For example, traditional card payments happen in a single step where you enter your card details and receive an immediate response. However, many local payment methods require multiple steps, such as redirecting to another website, authenticating with a separate app, or scanning a QR code.

MST defines a PSP-agnostic transaction language to describe the intermediate steps required in a payment. These steps are called Actions. Common action types include redirects to external websites or apps, strong customer authentication frictions like security challenges and fingerprinting, and payment method-specific flows unique to each LPM.

When a PSP indicates that an additional user action is required, its vendor plugin normalizes the request into an ActionPayload and returns it with a transaction intent status of ACTION_REQUIRED. This architecture ensures consistent handling of complex, multi-step payment experiences across diverse PSPs and markets.

Here is an example of what an ActionPayload looks like in JSON format:

{
  “actionPayload”: {
    “actionType”: “redirect”,
    “actionParameters”: {
      “redirectUrl”: “https://pspvendor1...”,
      “method”: “GET”
    }
  }
}

Source: Airbnb Engineering Blog

Three Foundational Payment Flows

While the modernized payment platform laid the foundation for enabling LPMs, these payment methods introduced unique challenges.

For example, many local methods require users to complete transactions in third-party wallet apps, introducing complexity in app switching, session handoff, and synchronization between Airbnb and external digital wallets. Each local payment vendor also exposes different APIs and behaviors across charge, refund, and settlement flows.

The Airbnb team analyzed the end-to-end behavior of their 20+ LPMs and identified three foundational payment flows that capture the full spectrum of user and system interactions.

The Redirect Flow

The first is the redirect flow. In this pattern, guests are redirected to a third-party site or app to complete the payment, then return to Airbnb to finalize their booking. Examples include Naver Pay, GoPay, and FPX. The process works as follows:

  • Airbnb’s payments platform sends a charge request to the local payment vendor

  • The vendor’s response includes a redirectUrl

  • The platform redirects the user to the external app or website

  • The user completes the payment

  • The user is redirected back to Airbnb with a result token

  • Airbnb’s payments platform uses this token to confirm and finalize the payment securely

The Async Flow

The second is the async flow, where “async” stands for asynchronous. Guests complete payment externally after receiving a prompt, such as a QR code or push notification. Airbnb receives payment confirmation asynchronously via webhooks. Examples include PIX, MB Way, and Blik. The process works as follows:

  • Airbnb’s payments platform sends a charge request to the local payment vendor.

  • The vendor’s response includes QR code data.

  • The checkout page displays the QR code for the user to scan.

  • The user completes the payment in their wallet app.

  • After payment succeeds, the vendor sends a webhook notification to Airbnb.

  • The platform updates the payment status and confirms the order.

See the diagram below:

The Direct Flow

The third is the direct flow.

Guests enter their payment credentials directly within Airbnb’s interface, allowing real-time processing similar to traditional card payments. Examples include Carte Bancaires and Apple Pay.

Config-Driven Integration

Airbnb embraced a config-driven approach powered by a central YAML-based Payment Method Config.

This file acts as a single source of truth for flows, eligibility rules, input fields, refund rules, and other critical details. Instead of scattering payment method logic across frontend code, backend services, and various other systems, the team consolidated all relevant details in this config.

Both core payment services and frontend experiences reference this single source of truth. This ensures consistency for eligibility checks, UI rendering, and business rules. The unified approach dramatically reduces duplication, manual updates, and errors across the technology stack.

See the diagram below:

These configs also drive automated code generation for backend services. Using code generation tools, the system produces Java classes, DTOs (Data Transfer Objects), enums, database schemas, and integration scaffolding. As a result, integrating or updating a payment method becomes largely declarative. You simply make a config change rather than writing extensive new code. This streamlines launches from months to weeks and makes ongoing maintenance far simpler.

Dynamic Payment Widget

The payment widget is the payment method UI embedded into the checkout page. It includes the list of available payment methods and handles user inputs. Local payment methods often require specialized input forms and have unique country and currency eligibility requirements.

For example, PIX in Brazil requires the guest’s first name, last name, and CPF, which is the Brazilian tax identification number. Rather than hardcoding forms and rules into the client applications, Airbnb centralizes both form field specification and eligibility checks in the backend.

Servers send configuration payloads to clients, defining exactly which fields to collect, which validation rules to apply, and which payment options to render. This empowers the frontend to dynamically adapt UI and validation for each payment method. Teams can accelerate launches and keep user experiences current without requiring frequent client releases.

See the diagram below:

Testing Infrastructure

Testing local payment methods presents unique challenges.

Developers often don’t have access to local wallets. For example, a developer in the United States cannot easily test PIX, which requires a Brazilian bank account. Yet with such a broad range of payment methods and complex flows, comprehensive testing is essential to prevent regressions and ensure seamless functionality.

To address this challenge, Airbnb enhanced its in-house Payment Service Provider Emulator. See the diagram below:

This tool enables realistic simulation of PSP interactions for both redirect and asynchronous payment methods. The Emulator allows developers to test end-to-end payment scenarios without relying on unstable or nonexistent PSP sandboxes.

For redirect payments, the Emulator provides a simple UI mirroring PSP acquirer pages. Testers can explicitly approve or decline transactions for precise scenario control. For async methods, it returns QR code details and automatically schedules webhook emission tasks upon receiving a payment request. This delivers a complete, reliable testing environment across diverse LPMs.

Centralized Observability

Maintaining high reliability and availability is critical for Airbnb’s global payment system.

As the team expanded to support many new local payment methods, they faced increasing complexity. There were greater dependencies on external PSPs and wide variations in payment behaviors. A real-time card payment and a redirect flow like Naver Pay follow completely different technical paths.

Without proper visibility, regressions can go unnoticed until they affect real users. As dozens of new LPMs went live, observability became the foundation of reliability. Airbnb built a centralized monitoring framework that unifies metrics across all layers, from client to PSP.

When launching a new LPM, onboarding requires a single config change. Add the method name, and metrics begin streaming automatically. The system tracks four layers:

  • Client metrics showing user-level flow health from client applications

  • Payment backend metrics providing API-level metrics for payment flows

  • PSP metrics offering API-level visibility between Airbnb and the PSP

  • Webhook metrics tracking async completion status for redirect methods or refunds

Airbnb also standardized alerting rules across platform layers using composite alerts and anomaly detection. Each alert follows a consistent pattern with failure count, failure rate, and time window thresholds. An example alert might state: “Naver Pay resume failures greater than 5 and failure rate greater than 20% in 30 minutes.” This design minimizes false positives during low-traffic periods.

Conclusion

Two examples demonstrate the impact of this work.

  • Naver Pay is one of the fastest-growing digital payment methods in South Korea. As of early 2025, it reached over 30.6 million active users, representing approximately 60% of the South Korean population. Enabling Naver Pay delivered a more seamless and familiar payment experience for local guests while expanding Airbnb’s reach to new users who prefer Naver Pay as their primary payment method.

  • PIX is an instant payment system developed by the Central Bank of Brazil. By late 2024, more than 76% of Brazil’s population was using PIX, making it the country’s most popular payment method, surpassing cash, credit cards, and debit cards. In 2024 alone, PIX processed over 26.4 trillion Brazilian reals, approximately 4.6 trillion US dollars, in transaction volume. This underscores its pivotal role in Brazil’s digital payment ecosystem.

The Pay as a Local initiative delivered significant business and technical impact. Airbnb observed booking uplift and new user acquisition in markets where they launched local payment methods. Integration time was reduced through reusable flows and config-driven automation. Reliability improved through enhanced observability for early outage detection, standardized testing to prevent regressions, and streamlined vendor escalation and on-call processes for global resilience.

In other words, supporting local payment methods helps Airbnb remain competitive and relevant in the global travel industry. These payment options improve checkout conversion, drive adoption, and unlock new growth opportunities.

References:

Top AI GitHub Repositories in 2026

2026-03-09 23:31:31

Give any AI agent access to Google search with SerpApi (Sponsored)

Agents are smarter when they can search the web - but with SerpApi, you don’t need to reinvent the wheel. SerpApi gives your AI applications clean, structured web data from major search engines and marketplaces, so your agents can research, verify, and answer with confidence.

Access real-time data with a simple API.

Start For Free


The AI open-source ecosystem has entered an extraordinary growth phase.

GitHub’s Octoverse 2025 report revealed that over 4.3 million AI-related repositories now exist on the platform, a 178% year-over-year jump in LLM-focused projects alone. In this environment, a select group of repositories has emerged as clear frontrunners, each amassing tens or even hundreds of thousands of stars by offering developers the tools to build autonomous agents, deploy models locally, and streamline AI-powered workflows.

Let’s look at the most impactful AI repositories trending on GitHub right now, covering what they do, why they matter, and how they fit into the broader AI landscape.

OpenClaw

OpenClaw is the breakout star of 2026 and arguably the fastest-growing open-source project in GitHub history.

Created by PSPDFKit founder Peter Steinberger, it surged from 9,000 to over 60,000 stars in just a few days after going viral in late January 2026, and has since blown past 210,000 stars. The project was originally named Clawdbot, then Moltbot, and finally settled on OpenClaw.

At its core, OpenClaw is a personal AI assistant that runs entirely on your own devices. It operates as a local gateway connecting AI models to over 50 integrations, including WhatsApp, Telegram, Slack, Discord, Signal, and iMessage. Unlike cloud-based assistants, your data never leaves your machine. The assistant is always on, capable of browsing the web, filling out forms, running shell commands, writing and executing code, and controlling smart home devices. What sets it apart from other AI tools is its ability to write its own new skills, effectively extending its own capabilities without manual intervention.

OpenClaw has found use across developer workflow automation, personal productivity management, web scraping, browser automation, and proactive scheduling. On February 14, 2026, Steinberger announced he would be joining OpenAI, and the project would transition to an open-source foundation. Security researchers have raised valid concerns about the broad permissions the agent requires to function, and the skill repository still lacks rigorous vetting for malicious submissions, so users should be mindful of these risks when configuring their instances.

n8n

n8n is an open-source workflow automation platform that combines a visual, no-code interface with the flexibility of custom code, now enhanced with native AI capabilities. It has over 400 integrations and a self-hosted, fair-code license. This gives technical teams full control over their automation pipelines and data.

The platform stands out due to its AI-native approach. Users can incorporate large language models directly into their workflows via LangChain integration. They can build custom AI agent automations alongside traditional API calls, data transformations, and conditional logic. This bridges the gap between conventional business automation tools and cutting-edge AI agent workflows. For enterprises with strict data governance requirements, the self-hosting option is particularly valuable.

Common applications include AI-driven email triage, automated content pipelines, customer support agent flows, data enrichment workflows, and multi-step AI processing chains.

Ollama

In a landscape dominated by cloud API subscriptions, Ollama took the opposite approach. It is a lightweight framework written in Go for running and managing large language models entirely on your own hardware. No data is sent to external services, and the entire experience is designed to work offline.

Ollama provides simple commands to download, run, and serve models locally, supporting Llama, Mistral, Gemma, DeepSeek, and a growing list of others. It includes desktop apps for macOS and Windows, which means even non-developers can get started with local AI. Its partnerships to support open-weight models from major research labs drove a massive surge of interest.

The project has become the backbone of the local AI movement, enabling developers to experiment with and deploy LLMs in privacy-critical or cost-sensitive environments. It pairs quite well with tools like Open WebUI to create a fully self-hosted alternative to commercial AI chat products.


Unblocked: Context that saves you time and tokens (Sponsored)

AI coding tools are fast, capable, and completely context-blind. Even with rules, skills, and MCP connections, they generate code that misses your conventions, ignores past decisions, and breaks patterns. You end up paying for that gap in rework and tokens.

Unblocked changes the economics.

It builds organizational context from your code, PR history, conversations, docs, and runtime signals. It maps relationships across systems, reconciles conflicting information, respects permissions, and surfaces what matters for the task at hand. Instead of guessing, agents operate with the same understanding as experienced engineers.

You can:

  • Generate plans, code, and reviews that reflect how your system actually works

  • Reduce costly retrieval loops and tool calls by providing better context up front

  • Spend less time correcting outputs for code that should have been right in the first place

See how it works


Langflow

Langflow is a low-code platform for designing and deploying AI-powered agents and retrieval-augmented generation (RAG) workflows, built on top of the LangChain framework. It provides a drag-and-drop interface for constructing chains of prompts, tools, memory modules, and data sources, with support for all major LLMs and vector databases.

Developers can visually orchestrate multi-agent conversations, manage memory and retrieval layers, and then deploy those flows as APIs or standalone applications. This eliminates the need for extensive backend engineering when prototyping complex AI pipelines. What used to take weeks of coding can often be assembled in an afternoon.

Langflow has attracted a significant community of data scientists and engineers. Its most common use cases include RAG pipeline prototyping, multi-agent conversation design, custom chatbot creation, and rapid LLM application development.

Dify

Dify is a production-ready platform for agentic workflow development, offering an all-in-one toolchain to build, deploy, and manage AI applications. Written primarily in TypeScript, it handles everything from enterprise QA bots to AI-driven custom assistants.

The platform includes a workflow builder for defining tool-using agents, built-in RAG pipeline management, support for multiple AI model providers, including OpenAI, Anthropic, and various open-source LLMs, usage monitoring, and both local and cloud deployment options. It also supports Model Context Protocol (MCP) integration. Dify handles the infrastructure boilerplate so teams can focus on crafting their agent logic.

Dify fills a crucial gap for teams that want to stand up AI-powered services quickly under an open-source, self-hostable framework. Its use cases span enterprise chatbot deployment, AI-powered internal tools, customer support automation, and multi-model orchestration

LangChain

LangChain has cemented its position as the foundational framework for building reliable AI agents in Python. It provides modular components for chains, agents, memory, retrieval, tool use, and multi-agent orchestration. Its companion project, LangGraph, extends this further with support for complex, stateful agent workflows that include cycles and conditional branching.

Many of the other projects on this list build on top of LangChain or integrate with it directly, making it the connective tissue of the AI agent ecosystem. It has great support from Anthropic, OpenAI, Google, and every major model provider.

Developers use LangChain for building multi-agent systems, tool-using AI agents, RAG pipelines, conversational AI applications, and structured data extraction.

Open WebUI

Open WebUI is a self-hosted AI platform designed to operate entirely offline, with over 282 million downloads and 124k+ stars. It provides a polished, ChatGPT-style web interface that connects to Ollama, OpenAI-compatible APIs, and other LLM runners, all installable from a single pip command.

The feature set is extensive. It includes a built-in inference engine for RAG, hands-free voice and video call capabilities with multiple speech-to-text and text-to-speech providers, a model builder for creating custom agents, native Python function calling, and persistent artifact storage. For enterprise users, it offers SSO, role-based access control, and audit logs. A community marketplace of prompts, tools, and functions makes extending the platform straightforward.

If Ollama provides the engine for running local models, Open WebUI provides the interface. Together, they form a popular self-hosted AI stack. Its primary use cases include private ChatGPT replacements, multi-model comparison setups, team AI platforms, and RAG-powered document question-answering systems.

DeepSeek-V3

DeepSeek-V3 stunned the AI community by delivering benchmark results that rival frontier closed models like GPT-4, as a fully open-weight release.

Built on a Mixture-of-Experts (MoE) architecture, it is optimized for general-purpose reasoning and supports ultra-long 128K token contexts. The team introduced novel training techniques, including distilled reasoning chains, that set new standards in the open model community.

DeepSeek-V3 proved that the open-source community can produce models competitive with the best proprietary offerings. This has significant implications for developers who want high-capability AI without vendor lock-in, recurring API costs, or data privacy concerns. The model is available for free commercial use and can be fine-tuned for domain-specific applications. It runs locally via Ollama and has become a popular choice for powering custom AI agents and enterprise chatbots.

Google Gemini CLI

Gemini CLI is Google’s open-source contribution to the agentic coding space, bringing the Gemini multimodal model directly into developers’ terminals.

With a simple npx command, developers can chat with, instruct, and automate tasks using the Gemini model from the command line. It supports code assistance, natural language queries, integration with Google Cloud services, and can be embedded into scripts and CI/CD pipelines.

Basically, the tool abstracts away the complexity of API integration, making frontier AI capabilities immediately accessible from any terminal environment. Common uses include AI-assisted coding, command-line automation, batch file processing, and rapid prototyping.

RAGFlow

RAGFlow is an open-source retrieval-augmented generation engine that combines advanced RAG techniques with agentic capabilities to create a robust context layer for LLMs.

The platform provides an end-to-end framework covering document ingestion, vector indexing, query planning, and tool-using agents that can invoke external APIs beyond simple text retrieval. It also supports citation tracking and multi-step reasoning, which are critical for enterprise applications where answer traceability matters.

As organizations move beyond basic chatbots toward production AI systems, RAGFlow addresses the hardest challenge: making AI answers grounded, traceable, and reliable. With 70k+ stars, it has become a key infrastructure component for enterprise knowledge bases, compliance-focused AI, research assistants, and multi-source data analysis workflows [10].

Claude Code

Claude Code is Anthropic’s agentic coding tool that operates from the terminal. Once installed, it understands the full codebase context and executes developer commands via natural language. You can ask it to refactor functions, explain files, generate unit tests, handle git operations, and carry out complex multi-file changes, all guided by conversation.

Claude Code distinguishes itself from simpler code completion tools via its ability to reason about an entire project, execute multi-step tasks, and maintain context across long coding sessions. It operates as an AI pair programmer that is deeply aware of your project structure and can act on code independently.

Its primary applications include full-codebase refactoring, automated test generation, code review and explanation, and git workflow automation.

Key Trends Shaping This Landscape

Some common trends are shaping this landscape:

The Local AI Revolution

Ollama, Open WebUI, and OpenClaw collectively signal a massive shift: developers want to run AI on their own hardware.

Privacy concerns, API costs, and the desire for deep customization are driving this movement. The infrastructure for self-hosted AI has matured to the point where a single command can spin up a full-featured AI platform.

Agentic AI Goes Mainstream

Nearly every repository on this list incorporates some form of autonomous agent behavior.

We have moved from AI that responds to AI that acts. Agents can now browse the web, execute code, manage files, orchestrate multi-step workflows, and even improve their own capabilities, all running continuously on your machine.

Open Models Close the Gap

DeepSeek-V3’s success proved that open-weight models can compete with the best proprietary offerings.

Combined with efficient local runtimes like Ollama, this means developers can build high-capability applications without any API dependency. This trend is reshaping the economics of AI development fundamentally.

The Rise of Visual AI Building

Langflow, Dify, and n8n show that drag-and-drop visual interfaces are becoming the preferred way to design AI agent pipelines.

This means domain experts, not just ML engineers, can create sophisticated AI applications. The barrier to entry for building production AI has never been lower.

Conclusion

The repositories profiled here are more than trending projects. They are the building blocks of a new AI infrastructure stack.

For developers, the takeaway is clear. Whether you are building an AI-powered product, automating internal workflows, or experimenting with frontier models, the tools in this list represent the most battle-tested, community-validated starting points available. Keep them on your radar. The pace of innovation in this space shows no signs of slowing down.

References: