Welcome to Distributed Bytes!

I share interesting articles, videos, papers and more about distributed systems, formal methods and computer science.

Made with by Federico Ponzi

Recent Posts

Scaling PostgreSQL

How OpenAI scales PostgreSQL

Scaling PostgreSQL

Scaling PostgreSQL to power 800 million ChatGPT users

For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API. As our user base grows rapidly, the demands on our databases have increased exponentially, too. Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.

Scaling PostgreSQL to power 800 million ChatGPT users

SpiceDB Documentation - Authzed Docs

Welcome to the SpiceDB and AuthZed docs site.

SpiceDB Documentation - Authzed Docs

On Idempotency Keys - Gunnar Morling

In distributed systems, there’s a common understanding that it is not possible to guarantee exactly-once delivery of messages. What is possible though is exactly-once processing. By adding a unique …

On Idempotency Keys - Gunnar Morling

What Does Write Skew Look Like?

This post is about gaining intuition for Write Skew, and, by extension, Snapshot Isolation. Snapshot Isolation is billed as a transaction isolation level that offers a good mix between performance and correctness.

What Does Write Skew Look Like?

How to do distributed locking — Martin Kleppmann’s blog

Redis has been gradually making inroads into areas of data management where there are stronger consistency and durability expectations – which worries me, because this is not what Redis is designed for. Arguably, distributed locking is one of those areas. Let’s examine it in some more detail.

How to do distributed locking — Martin Kleppmann’s blog

Reproducing the AWS Outage Race Condition with a Model Checker | Waqas Younas' blog

As a small experiment, we’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle concurrency bugs.

Reproducing the AWS Outage Race Condition with a Model Checker | Waqas Younas' blog

TLA+ Modeling of AWS outage DNS race condition

On Oct 19–20, 2025, AWS’s N. Virginia region suffered a major DynamoDB outage triggered by a DNS automation defect that broke endpoint resol...

TLA+ Modeling of AWS outage DNS race condition

TernFS: an exabyte scale, multi-region distributed filesystem | XTX Tech Blog

This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.

TernFS: an exabyte scale, multi-region distributed filesystem | XTX Tech Blog

Just make it scale: An Aurora DSQL story | All Things Distributed

The goal with Aurora DSQL’s design is to break up the database into bite-sized chunks with clear interfaces and explicit contracts. Each component follows the Unix mantra—do one thing, and do it well—but working together they are able to offer all the features users expect from a database (transactions, durability, queries, isolation, consistency, recovery, concurrency, performance, logging, and so on).

Just make it scale: An Aurora DSQL story | All Things Distributed