Dictionary Implementation
Last updated
Last updated
Okay, time to get into the nitty-gritty! This is an advanced topic and an understanding of generics and optionals strongly recommended. Also, an understanding of protocols would be great.
If you don't want to hear my long-winded explanation of hash maps, or you think you're an amazing Swifty, go see Swift's scary implementation:
Okay, for those of you that want an explanation, or got scared away from ever reading Swift's documentation, here we go! Sorry quick disclaimer, this is not how Swift does it. This is how to get an intuition of how dictionaries work. Also, I will be naming my data type HashMap
instead of Dictionary
because Swift
's is called Dictionary
and my playground is being fickle.
As I mentioned before, hash maps leverage a hash function that converts a key into an integer. These are great because we can then shove all of our pairs into an array and get them by their converted key. This means our get and insert are O(1)
. If you are interested in other methods and the derivation of hash maps, I heavily suggest 3110. It does a great job and is probably a course that you would enjoy.
Anyways, these hash functions are guaranteed by the Hashable
, so let's leverage that. We can create a generic struct
with a Key
and Variable
type as we normally would. We can further specify that Key
conforms to hashable, with the where
clause or with a :
.
Also, I am using a struct
because that's what Swift
uses!
Alright, onto the next thing. I am going to be storing key-value pairs in an array. Accordingly, I am going to have to write the type (Key, Value)
quite a lot, and I am lazy. Hence, I will rename it Element
.
Note that users may not know what this type is, so it's good to rename the typical 0, 1 to kev
and value
.
Okay, so we're going to go back into a little theory before the rest of the implementation. As I said, we are going to be storing our Element
s into an array. This is going to be a special array because I want to have a fixed length. Why? Hash Functions.
We can think of hash functions as a black box that takes in a key and produce some random, potentially massive, number. If we want this arbitrarily large number to fit into an array, we could directly have that correspond to an index in the array, but that's going to be huge! So, instead, we shorten the array and the index accordingly with a modulo.
Think of modulo as a remainder function denoted with %
:
Anyways, we use the modulo operator to keep the index within the boundaries of the array.
We're going to call the length (array.count - 1
) of the array, the capacity. Thus, the index of a key-value pair will be key.hashValue % capacity
.
You may have figured out that we are going to have overlaps at certain indices. We are going to do two things to combat this:
We will store buckets
of values per each value
i.e. This will be a 2D array of Element
s
We will resize the array (by a factor of 2) every time the number of elements exceeds the capacity
First, we'll add capacity
. That's just going to be an integer that starts at 0.
var capacity: Int = 0
Great, now onto count. I am going to update count
, but instead, you could make it a computed property and monitor it to see if count
exceeds capacity
. The advantage of manually updating it is that I don't need to "monitor it".
Let's add count
with a didSet
to handle the load factor (count
/ capacity
) for us:
You might be wondering why I go from 0 -> 1 -> 3 -> *2. I don't have a good answer for you. I did some trial and error with Swift's implementation and this is what I saw, so that's what I did.
We will implement and discuss rehash
later.
Lastly, we implement our buckets:
This is probably the coolest / most niche part of this data structure. We can implement subcripting–square-bracket based get and put operations–with the subscript
property. Think of this kind of as a variable with a getter and a setter. We write this as:
subscript(_ key: Key) -> Value?
Notice that we get an optional value, as well as we set an optional value! Okay, first we can implement the getter:
We check that the dictionary is not empty, so we can avoid index out of bound errors. Then, we get the index returns anything element that shares the key.
Putting is a bit more complex because we need to maintain the count
variable. There are three cases:
The key is unique -> we are adding a new element
The key is old -> we are replacing an element
The value newValue
is nil
-> we are removing an element (if an element exists)
To do this, I will create a helper function remove
that will check if there is a value bound to a key, and if so, it will remove the pairing, decrement count, and return the value, otherwise it will return nil.
Note: We need mutating
because we are working in a struct
.
For ease of use in the future, we are also going to factor out an insert
function. This function, will call remove
and then insert accordingly. This code structure also makes sure that count
and capacity
are updated before we do any extra work.
With remove
and insert
done, we can finish our subscript:
One problem we have is that since our placement of elements depends on the size of an array, if we resize our array, everything will be screwed up. So, every time we resize the array, we need to insert everything again. Hence, rehash
:
We're done! Sure, you could add more like init
s with Sequence
s and literals, but I am lazy and it's 2 AM, so I shall be going to sleep. Peace out y'all. Here's my finished code:
I added the pair
variable for testing purposes.
Here are some lacking tests that show that it works, I think: