How I solved a distributed queue problem after 15 years | DBOS
Learn how queues make horizontal scaling, scheduling, and flow control easier in cloud systems, and how to make them durable and observable.
I share interesting articles, videos, papers and more about distributed systems, formal methods and computer science.
Learn how queues make horizontal scaling, scheduling, and flow control easier in cloud systems, and how to make them durable and observable.
We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over Paxos. Paxos revolutionized distributed computing by providing the first provably correct solution for achieving consensus among unreliable processors, forming the theoretical foundation for modern distributed systems and databases. Paxos is one of the most important and most difficult to understand algorithm. In this blog I will simplify and explain paxos in a very intuitive way.
Murat Demirbas (https://muratbuffalo.blogspot.com) and Aleksey Charapko (https://charap.co) read and discuss "Real Life Is Uncertain. Consensus Should Be Too...
This is definitely not a "learn distributed systems in 21 days" post. I recommend a principled, from the foundations-up, studying of distrib...
The consensus problem involves an asynchronous system of processes,some of which may be unreliable. The problem is for the reliable processesto agree on a binary value. In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process. By way of contrast, solutions are known for the synchronous case, the “Byzantine Generals” problem.
AWS Senior Principal Engineers, Niko Matsakis and Marc Bowes, take us inside Aurora DSQL's development: scaling write operations without two-phase commit, overcoming garbage collection hurdles, and embracing Rust for both data and control planes.
Here at decentralized thoughts, we spend a lot of time reasoning about distributed protocols. Often, we focus on solving distributed consensus, personally it’s my favorite CS problem, but it’s also famously one of the most difficult and subtle problems in distributed computing. Reasoning about distributed algorithms is hard at the...
In this blog I will go over how Apache Iceberg contributes to performance of compute engine. Apache Iceberg is an ACID table format designed for large-scale analytics workloads. While its consistency and schema evolution features are covered in previous blog, its impact on query performance can be equally transformative. By the end of this document, you will have a deep understanding of how Iceberg enhances performance, the trade-offs involved, and best practices for maximizing efficiency in read-heavy workloads.
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework from CMU’s PASTA Lab, that turns flaky failures into reliably reproducible ones.
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.