Writing a hashmap in c

The code for this project can be found on github

Hashmaps(associative arrays) are a type of data structures which store key/value pairs. A key is usually a type(usually a string) and the value some type. For example a kv pair which maps strings to integers could be represented by "k" -> [0,1,2,3,(4)], when the key "k" is hashed, it maps to the value (4). Interestingly, the c standard library does not provide an implementation. so let's write one:

Basic structure

First let's start by representing a simple key/value pair:


typedef struct KeyValuePair {
	char *key;
	int value;
	struct KeyValuePair *next;
        } KeyValuePair;
          

for now it's just a string->int kv pair. Next we define our hashmap:


          typedef struct {
            HashFunction hashfunction; 
            KeyValuePair *buckets[HASHMAP_SIZE]; 
          } HashMap;
          

This will hold our keyValuePairs and use a user-supplied hash function to compute the index into the array of key value pairs in order to perform an operation. Speaking of which... let's define our operations:


          HashMap* createHashMap();
          void freeHashMap(HashMap *hmap);
          int hash(const char *key);
          void put(HashMap *map, const char *key, int value);
          int get(const HashMap *map, const char *key);
          

These functions allow us to support the basic requirements of the data structure: inserting a value and getting that value (if it exists)

We won't concern ourselves with allocating and deallocating memory, you can checkout the source code for more details, what we will focus on instead is how to resolve a collision when one occurs. It's possilbe that the hash function computes the same index for different items, this is known as a collision. There are different ways of resolving collisions. First let's take a look at chaining.

Chaining

With this method of handling collisions, each bucket in the hashmap is backed by a linkedlist.

checkout this commit for the source


          void put(HashMap *map, const char *key, int value) {
            int index = hash(key); 
            //collision
            KeyValuePair* kvp = createKeyValuePairWithValues(key, value);
            if(map->buckets[index] == NULL) {
              map->buckets[index] = kvp;
            }
            else {
              KeyValuePair* current = map->buckets[index];
              while(current->next){
                current = current->next; 
              }
              current->next = kvp;
            }
        }
          

As you can see, should we index into the bucket and it is not empty, that means we have a collision. When a collision occurs, we walk the list and insert at the end of the list.

As a consequence, we must adapt our get function to search the linkedlist for the key:


          int get(const HashMap *map, const char *key) {
            int index = hash(key);
            KeyValuePair* current = map->buckets[index];
            KeyValuePair* next = current;
            while(next && strcmp(current->key, key) != 0) {
              current = next;
              next = next->next;
            }
            int value = current->value;
            return value;
          }
          

Open addressing

Open addressing is another method of collision resolution. Instead of using another datastructure which can handle multiple values, ie a linkedlist as in the above chaining example, open addressing moves to the next available empty slot in the associative array. There are several methods of determining where to find the next slot in the array such as:

  • linear probing: just go to the next free index, wrapping around when reaching the end.
  • Quadratic probing: use quadratic function to determine the next index
  • double hashing: use another hash function if a collision occurs

below is a simple linear probing example:


          void put(HashMap *map, const char *key, int value) {
            int index = hash(key); 
            KeyValuePair* kvp = createKeyValuePairWithValues(key, value);
            if(map->buckets[index] == NULL) {
              map->buckets[index] = kvp;
              return;
              }
              //collision
            else {
              int startIndex = index;
              startIndex++;
              while(startIndex != index && map->buckets[startIndex] != NULL) {
                startIndex = (startIndex + 1 % HASHMAP_SIZE);
        }
        map->buckets[startIndex] = kvp;
    }
}
          

Make it generic

In order to support other types, it is mostly a case of providing your size_t and replacing occurances of int with void*

see this commit for more details


          typedef int (*HashFunction)(const char* key, size_t hmap_size);

          typedef struct KeyValuePair {
	    char *key;
	    void *value;
            size_t size;
          } KeyValuePair;

          typedef struct {
          size_t hashmap_size;
          HashFunction hashfunction; 
          KeyValuePair **buckets; 
          } HashMap;

          HashMap* createHashMap();
          void freeHashMap(HashMap *hmap);
          int hash(const char *key, size_t hashmap_size);
          void put(HashMap *map, const char *key, void* value, size_t size);
          void* get(const HashMap *map, const char *key);
          

concurrency to follow!