Why don't sets store data linearly in languages like Python
You must have read about Hashmaps and hashsets in your academic years, hashing is a powerful tool that is used everywhere. It keeps your passwords safe, and makes your `.get()` from a hashmap O(1).
Like when you use sets in python, you'll notice that they don't always return in the same order. It feels as if it is random.
TL;DR about hashing
If you are already aware of hashing, you can skip to HashDoS/Hash Flooding.
Think of hash like a function that when you input some Key and it outputs a result of fixed size. This function is also consistent, meaning the output would stay the same for the said input.
The problem with hashes
There's an issue with that tho, notice I said "result of fixed size". Let's assume your hash function outputs a string of length 12. Let the allowed character set be of length m, then all the possible outputs that it can have would be m!*12. It feels like a big number but for small length hashes like MD5, it is not so big. Your keys will start generating the same output. We have collisions.
Collision Resolution
We implement different techniques to deal with this, you can read about collision resolution in detail if you wish. For this context, we will talk about separate chaining.
Separate Chaining
So we have two inputs that map to the same output. We need to store both, but we can't store two at the same place. What we do is we create a linked-list from that node. Linked-lists give us easy insertions and deletions and can be of variable length. So we just extend the linked-list of a hash node if collision occurs. This is amazing right?
HashDoS/Hash Flooding
The end user passes different inputs such that they hash to the same value and get stored in a linked list. With enough such entries, the end user can intentionally make the request take linear time instead of the usual constant time that it usually takes. This can be extremely expensive, causes excessive CPU usage which can be taxing and even render the server unresponsive - a classic denial of service scenario.
You might be using the default hashset of the language you are programming in, and if it follows the linked list implementation, your program could be a victim to such an attack. Many languages have long adapted for such attacks by changing the implementation, like Java now uses balanced trees (e.g red-black trees), which ensured that even the worst case grow logarithmically.
This can also be prevented by using a better hashing algorithm like SHA-256, which makes it much harder to find multiple keys that map to same value.