Instagram is Lying to you, and you'll love them

Arjun Sharma24-12-2025

You remember when you were first creating your account on Instagram, you typed in that sick ass username. You were so excited, you had imagined how much aura you would have with that sick username. But then instagram flashbanged you with a Username already taken.

You broke down into tears, thinking you'd always be the loser with 1234 or _ in your username. You could never be @ass

What if I told you, maybe the username you typed was available? What if Instagram lied to you? What if I told you it didn't know for sure, it was just a calculated guess?

The Calculated Guess

The thing they use to make this calculated guess is called a Bloom Filter. They call it a "Probabilistic Data Structure", which is a clever way of saying "We're just out here guessing"

To understand what it really is, you need to first know what hashing is. You can read a quicky about that in another blog I wrote about HashDoS. Moving on, imagine it being an array of bits and of length n. 1s and 0s only. You are given a string, and you need to store in inside that bitarrayin a way that is unique to that string. Well if you get your horses running, you'll remember that we can use hashing and some modulo arithmetic to achieve this.

We hash the string - get a number - modulo it to get the index - set the bit at that index to 1. If the bit is not set, we know the string is definitely not in the set.

But then you remember something called hash collisions. Uff, what if several keys reach the same index when we modulo them..ughh..thats a bummer. But wait, I don't see to just hash it once, what if I hash it twice, or even thrice and store all those indexes in the bitarray. We can use multiple hash functions to reduce the chances of exact collisions.

Keyword: "reduce the chances of exact collisions". Reduce is different from none. We can't guarantee that collisions will not occur, a string that does not exist in the set will still have a chance of being marked as present. But the best part is that we can gurantee that a string doesn't exist if any of its bits is 0.

Below is an interactive demo for you to play with your personal little 16bit bloom filter.

Bit Array (16 bits)

0/16 filled (0%)
00
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150

Color Legend

New word added
Might already exist
Check: possibly exists
Check: definitely not

Tip: Try adding words like "cat", "dog", "fish" then check for "bird" (definitely not there) or try adding the same word twice.

Using 3 hash functions with a 16-bit array. False positive rate increases as the filter fills up!

Why still use it?

The answer is simple. It is fast as fuck. Infinite times faster than querying the DB and comparing that shit. It is also space efficient, it's just bits. It will also never give you a false negative. It will never tell you a string doesn't exist if it does. Yes it can fumble and tell you a string exists if it doesn't. But that's okay, because it's fast and space efficient. It is best for usecases where the exactness doesn't matter of positives, but negatives are crucial.

Prime example would be you setting your username. You can never enter a username and have the bloom filter tell you it doesn't exist when it does.

There is also a formula to calculate the optimal size of the bloom filter and the number of hash functions to use such that the false positive rate is minimum. You can check that and a implementation of the same in Python on Github by clicking Arjunsharmahehe/bloom-filter