Dictionary Implementation

Okay, time to get into the nitty-gritty! This is an advanced topic and an understanding of generics and optionals strongly recommended. Also, an understanding of protocols would be great.

If you don't want to hear my long-winded explanation of hash maps, or you think you're an amazing Swifty, go see Swift's scary implementation:

Okay, for those of you that want an explanation, or got scared away from ever reading Swift's documentation, here we go! Sorry quick disclaimer, this is not how Swift does it. This is how to get an intuition of how dictionaries work. Also, I will be naming my data type HashMap instead of Dictionary because Swift's is called Dictionary and my playground is being fickle.

As I mentioned before, hash maps leverage a hash function that converts a key into an integer. These are great because we can then shove all of our pairs into an array and get them by their converted key. This means our get and insert are O(1). If you are interested in other methods and the derivation of hash maps, I heavily suggest 3110. It does a great job and is probably a course that you would enjoy.

Anyways, these hash functions are guaranteed by the Hashable, so let's leverage that. We can create a generic struct with a Key and Variable type as we normally would. We can further specify that Key conforms to hashable, with the where clause or with a :.

struct Dictionary<Key, Value> where Key: Hashable { ... } 
// I think this one is sleaker
struct HashMap<Key: Hashable, Value> { ... } 
// But, this one is easier to understand

Also, I am using a struct because that's what Swift uses! Alright, onto the next thing. I am going to be storing key-value pairs in an array. Accordingly, I am going to have to write the type (Key, Value) quite a lot, and I am lazy. Hence, I will rename it Element.

typealias Element = (key: Key, value: Value)

Note that users may not know what this type is, so it's good to rename the typical 0, 1 to kev and value.

Okay, so we're going to go back into a little theory before the rest of the implementation. As I said, we are going to be storing our Elements into an array. This is going to be a special array because I want to have a fixed length. Why? Hash Functions.

Hashing Explanation

We can think of hash functions as a black box that takes in a key and produce some random, potentially massive, number. If we want this arbitrarily large number to fit into an array, we could directly have that correspond to an index in the array, but that's going to be huge! So, instead, we shorten the array and the index accordingly with a modulo.

Think of modulo as a remainder function denoted with %:

extension Int {
    
    static func % (_ value: Int, cap: Int) -> Int {
        let div = value / cap
        let remainder = value - div * cap
        return remainder
    }
    
}

Anyways, we use the modulo operator to keep the index within the boundaries of the array.

We're going to call the length (array.count - 1) of the array, the capacity. Thus, the index of a key-value pair will be key.hashValue % capacity.

Buckets, Count, & Capacity

You may have figured out that we are going to have overlaps at certain indices. We are going to do two things to combat this:

  1. We will store buckets of values per each value

    i.e. This will be a 2D array of Elements

  2. We will resize the array (by a factor of 2) every time the number of elements exceeds the capacity

First, we'll add capacity. That's just going to be an integer that starts at 0.

var capacity: Int = 0

Great, now onto count. I am going to update count, but instead, you could make it a computed property and monitor it to see if count exceeds capacity. The advantage of manually updating it is that I don't need to "monitor it".

Let's add count with a didSet to handle the load factor (count / capacity) for us:

 var count: Int = 0 {
    didSet {
        if count > capacity {
            switch capacity {
            case 0:
                capacity = 1
            case 1:
                capacity = 3
            default:
                capacity *= 2
            }
        }
        rehash()
    }
}

You might be wondering why I go from 0 -> 1 -> 3 -> *2. I don't have a good answer for you. I did some trial and error with Swift's implementation and this is what I saw, so that's what I did.

We will implement and discuss rehash later.

Lastly, we implement our buckets:

private var buckets = [[Element]]()

Subscript

This is probably the coolest / most niche part of this data structure. We can implement subcripting–square-bracket based get and put operations–with the subscript property. Think of this kind of as a variable with a getter and a setter. We write this as:

subscript(_ key: Key) -> Value?

Notice that we get an optional value, as well as we set an optional value! Okay, first we can implement the getter:

get {
    if capacity == 0 { return nil }
    let index = key.hashValue % buckets.count
    return buckets[index].first(where: { $0.key == key })?.value
}

We check that the dictionary is not empty, so we can avoid index out of bound errors. Then, we get the index returns anything element that shares the key.

Putting is a bit more complex because we need to maintain the count variable. There are three cases:

  1. The key is unique -> we are adding a new element

  2. The key is old -> we are replacing an element

  3. The value newValue is nil -> we are removing an element (if an element exists)

To do this, I will create a helper function remove that will check if there is a value bound to a key, and if so, it will remove the pairing, decrement count, and return the value, otherwise it will return nil.

private mutating func remove(_ key: Key) -> Value? {
    if let element = self[key] {
        let index = key.hashValue % buckets.count
        buckets[index].removeAll(where: { $0.key == key })
        self.count -= 1
        return element
    } else {
        return nil
    }
}

Note: We need mutating because we are working in a struct.

For ease of use in the future, we are also going to factor out an insert function. This function, will call remove and then insert accordingly. This code structure also makes sure that count and capacity are updated before we do any extra work.

private mutating func insert(element: Element) {
    remove(element.key)
    count += 1
    let index = element.key.hashValue % buckets.count
    buckets[index].append(element)
}

With remove and insert done, we can finish our subscript:

subscript(_ key: Key) -> Value? {
    get {
        if capacity == 0 { return nil }
        let index = key.hashValue % buckets.count
        return buckets[index].first(where: { $0.key == key })?.value
    }
    set {
        if let newValue = newValue {
            insert(element: (key: key, value: newValue))
        } else {
            remove(key)
        }
    }
}

Rehashing

One problem we have is that since our placement of elements depends on the size of an array, if we resize our array, everything will be screwed up. So, every time we resize the array, we need to insert everything again. Hence, rehash:

private mutating func rehash() {
    let buckets = self.buckets
    self.buckets = Array(repeating: [], count: capacity)
    for bucket in buckets {
        for value in bucket {
            insert(element: value)
        }
    }
}

Finished!

We're done! Sure, you could add more like inits with Sequences and literals, but I am lazy and it's 2 AM, so I shall be going to sleep. Peace out y'all. Here's my finished code:

struct HashMap<Key, Value> where Key: Hashable {
    
    typealias Element = (key: Key, value: Value)
    
    var count: Int = 0 {
        didSet {
            if count > capacity {
                switch capacity {
                case 0:
                    capacity = 1
                case 1:
                    capacity = 3
                default:
                    capacity *= 2
                }
            }
            rehash()
        }
    }
    
    var capacity: Int = 0
    private var buckets = [[Element]]()
    
    var pairs: [Element] {
        var elts = [Element]()
        for bucket in buckets {
            elts += bucket
        }
        return elts
    }
    
    subscript(_ key: Key) -> Value? {
        get {
            if capacity == 0 { return nil }
            let index = key.hashValue % buckets.count
            return buckets[index].first(where: { $0.key == key })?.value
        }
        set {
            if let newValue = newValue {
                insert(element: (key: key, value: newValue))
            } else {
                remove(key)
            }
        }
    }
    
    private mutating func remove(_ key: Key) -> Value? {
        if let element = self[key] {
            let index = key.hashValue % buckets.count
            buckets[index].removeAll(where: { $0.key == key })
            self.count -= 1
            return element
        } else {
            return nil
        }
    }
    
    private mutating func insert(element: Element) {
        remove(element.key)
        count += 1
        let index = element.key.hashValue % buckets.count
        buckets[index].append(element)
    }
    
    private mutating func rehash() {
        let buckets = self.buckets
        self.buckets = Array(repeating: [], count: capacity)
        for bucket in buckets {
            for value in bucket {
                insert(element: value)
            }
        }
    }
    
}

Tests

I added the pair variable for testing purposes.

Here are some lacking tests that show that it works, I think:

enum Courses: Hashable {
    case iOS
    case Backend
    case Design
}

var courseInstructors: HashMap<Courses, [String]> = HashMap()

courseInstructors[.iOS] = ["Noah", "Justin"]
courseInstru ctors.pairs
// courseInstructors: HashMap<Courses, [String]> = { .iOS: ["Noah", "Justin"] }

courseInstructors[.iOS] = ["Noah", "Justin"]
courseInstructors.pairs
// courseInstructors: HashMap<Courses, [String]> = { .iOS: ["Noah", "Justin"] }

courseInstructors[.Backend] = ["Kate", "Shungo"]
courseInstructors.pairs
// courseInstructors: HashMap<Courses, [String]> = 
//{ .iOS: ["Noah", "Justin"], .Backend: ["Kate", "Shungo"] }

courseInstructors[.iOS] = nil
courseInstructors.pairs
// courseInstructors: HashMap<Courses, [String]> = { .Backend: ["Kate", "Shungo"] }

Last updated