How OpenAI Serves 800 Million Users Without Sharding Postgres
OpenAI recently published a blog post titled Scaling PostgreSQL to Power 800 Million ChatGPT Users, and it’s one of those posts that I think anyone working with databases or distributed systems sho...

Source: DEV Community
OpenAI recently published a blog post titled Scaling PostgreSQL to Power 800 Million ChatGPT Users, and it’s one of those posts that I think anyone working with databases or distributed systems should read. Not because the techniques are new or groundbreaking, but because of what OpenAI chose not to do. No sharding. No fancy distributed databases. No custom storage engine. Just a single PostgreSQL primary, roughly 50 read replicas, and a lot of operational discipline. When I first read this, I was shocked by how much restraint OpenAI showed in every decision. Oftentimes as engineers we want the fancy solution that’ll look cool and be awesome to talk about and show off, but that doesn’t mean it’s the best solution. The overarching goal of this post is to walk through what OpenAI did, why it works, and what we can take away from it as engineers building systems at scale. Definitions and Background Before getting into the specifics of OpenAI’s setup, let’s define a few things that’ll come