How to Filter 1 Billion Requests Without Crashing Your Database

https://tenor.com/view/scam-alert-scam-phishing-hack-fraud-gif-62215083004720289

GIF credits: Tenor

Imagine you are building a URL spam filter for a service like Gmail. You have a list of 100 million malicious URLs, and you need to check every single link clicked by your 1 billion active users against this list.

The most obvious simple way would be to store the bad URLs in a SQL database and query it every time a user clicks a link.

If you try this in production, your infrastructure will crash in minutes.

In this post, I want to explore why the "obvious" solution fails at scale and how a probabilistic data structure called a Bloom Filter solves the problem by trading a tiny bit of accuracy for massive speed.

The Challenge: The "Scale" Wall

When you are dealing with web-scale traffic, you hit three distinct bottlenecks:

Storage: A blacklist of 100 million URLs takes up about 10 GB of space. That’s too big to keep in the fast L1/L2 cache of a standard server, forcing you to use slower storage.
Throughput: With 1 billion daily active users, you might face peaks of 30,000 queries per second (QPS). A standard database instance simply cannot handle that many read operations without an expensive cluster.
Latency: Users expect instant page loads. A round-trip to a database (network + disk I/O) takes about 2 milliseconds. That sounds fast, but in the world of high-frequency requests, it’s an eternity.

But here is the real kicker: 99% of your traffic is safe.

If you use a traditional database, you are wasting expensive computing resources to check 990 million "safe" links every day. You are effectively burning money to answer "No" over and over again.

The Solution: The Bloom Filter

A Bloom Filter is a space-efficient probabilistic data structure that tells you whether an element might be in a set or is definitely not in the set.

Think of it as a highly efficient gatekeeper. Instead of storing the full URL string (which is heavy), we project it onto a compact array of bits (0s and 1s) using hash functions.

How It Changes the Architecture

We place the Bloom Filter in the memory (RAM) of the application server itself. Before we ever bother the database, we ask the filter: "Have you seen this URL?"

If the Filter says "No": The URL is 100% safe. We let the user proceed immediately. No database call. Zero network latency.
If the Filter says "Maybe": It might be a malicious URL, or it might be a "False Positive." Only then do we check the database to confirm.

The Impact: By The Numbers

The difference between the "Traditional Approach" and the "Bloom Filter Approach" is not just incremental; it's exponential.

Metric	Traditional DB Lookup	With Bloom Filter	Improvement
Storage (RAM)	~10 GB	~125 MB	99% Smaller
Database Load	30,000 QPS	~300 QPS	98% Less Load
Latency	2 milliseconds	Nanoseconds	20,000x Faster

By accepting a standard 1% False Positive rate, we instantly eliminate 99% of the unnecessary traffic that would otherwise hit our database. We shrink a 10 GB problem into a 125 MB solution that fits on the RAM of a cheap laptop.

Industry Adoption

Google (Safe Browsing)
Medium (Feed Deduplication)
Cassandra (Database Speed)
Akamai (CDN Caching)
Bitcoin (Wallet Syncing)

Conclusion: The Power of "Good Enough"

In system design, we are often taught that accuracy is paramount. But Bloom Filters teach us that sometimes, knowing what is definitely NOT true is just as valuable as knowing what IS true.

If you can tolerate saying "Maybe" occasionally, but can never afford to be slow, use a Bloom Filter.

For a deeper dive into how bloom filters work with an interactive demo, check out https://samwho.dev/bloom-filters.

How to Filter 1 Billion Requests Without Crashing Your Database

The Challenge: The "Scale" Wall

The Solution: The Bloom Filter

How It Changes the Architecture

The Impact: By The Numbers

Industry Adoption

Conclusion: The Power of "Good Enough"

Comments

More from this blog

Portfolio Chatbot Architecture

Jan Recap: 3-Part A11y Series, Agentic AI, and portfolio v1

The Developer’s Audit for A11Y

The Science of Sight: Colors, Fonts, and the IBM Standard

The Invisible Loss: Why Your Site is Losing 16% of Its Traffic at the Door

Command Palette

The Challenge: The "Scale" Wall

The Solution: The Bloom Filter

How It Changes the Architecture

The Impact: By The Numbers

Industry Adoption

Conclusion: The Power of "Good Enough"

Comments

More from this blog