Operating System Development Series

This series is intended to demonstrate and teach operating system development from the ground up.

26: Memory Allocation

by Mike, 2016

Controlling complexity is the essence of computer programming“ -Brian W. Kernighan

1. Introduction

Welcome!

In the previous chapter, we built a working multitasking operating system with tasks running as kernel mode processes. Throughout this entire series we have been avoiding a topic that we perhaps should have introduced earlier. Unfortunately, we cannot go much farther without it – memory allocation. We have also been too lenient on other topics, such as page swapping, recursive page directories, and address space management. With a proper memory allocator, we can implement a kernel heap. With a well design, we can also extend the physical memory manager even farther to improve page allocation and zone allocations. The topics on virtual memory are equally important. There is a big problem with the current virtual memory manager that will quickly reveal itself if we remove the identity mapping. We will address this problem later when we introduce recursive page directories. Finally, we need a nice way to manage a process virtual address space and working set list. These will allow us to map what we need properly and allow us to support multi-threaded user mode tasks.

We will first dive into allocation techniques. Some of which we have introduced earlier, but will preset them a little differently here. These include the
linear allocator, free list, free stack, and bitmap. We will then look at two more advanced allocators – the buddy allocator and slab allocator. The slab allocator is the most complex allocator presented in this chapter.

We will then dive into
allocator design. We will look at building kernel heap pools, physical page allocation, virtual page allocation, and how everything works together.

In the next chapter, we will dive into address space management issues, including
page swapping, page files, recursive page directories, and address space structures.

2. General Allocators

Linear Allocator

The first type of allocator that we want to present is a linear allocator. This is the simplest type of allocator to implement. Allocations are very fast and run in constant time. It does not support freeing individual allocated blocks. Instead, the entire buffer gets freed when you are done with all of it.

For example, assume start points to the beginning of the buffer and end points to the end of the buffer. Farther, assume Current Offset stores the current byte offset from the start of the buffer. If we want to allocate n bytes from this buffer, all we need to do is return a pointer to the current location and update Current Offset += n.


Translating into code we get the following.

uint8_t* _currentOffset;
void*    _memoryEnd;
void*    _memoryStart;

void* alloc(size_t size, align_t align) {

   void* memory;

   if (_currentOffset >= _memoryEnd)
      return NULL;
   _currentOffset = alignup( currentOffset,align);
   memory = _currentOffset;
   _currentOffset += size;
}

Free Lists

The Free List is a linked list of free blocks of memory. Typically we would divide the memory into even sized blocks of n bytes. We can then build a linked list comprising of these free blocks inside of the buffer. If a block is free, it is in the list. If it is not free, we don't need to worry about managing it.

So we need a linked list that needs to be compatible with any type of block in the free list. We don't know nor care about the data in these blocks. But we do still need to store the list pointers somehow. So, we will introduce a new structure, LIST_ENTRY as follows.

typedef struct _LIST_ENTRY {
   struct LIST_ENTRY* next;
   struct LIST_ENTRY* prev;
}LIST_ENTRY;

Using LIST_ENTRY, we can store the forward and back links needed to construct the linked list anywhere we want. We can maintain a linked list only – which is what we want.

The data structure above though has some interesting properties. If we were to use it in some structure, we can get that structure given only a LIST_ENTRY pointer and the address of the LIST_ENTRY in the parent structure. For example,

typedef struct _FOO {
   int data1;
   int data2;
   LIST_ENTRY listEntry;
   int data3;
}FOO;
FOO bar;

Then &bar would give us the base location of the FOO object. We also know &bar->listEntry. We also know the offset of bar->listEntry. The offset is &((FOO*)0)->listEntry. So &bar->listEntry - &((FOO*)0)->listEntry points to the beginning of the parent structure FOO. This is how the CONTAINING_RECORD and OFFSETOF macros work. We can use CONTAINING_RECORD to get the parent structure of a LIST_ENTRY.

Okay, so a free list is just a linked list of free blocks in memory. The following figure illustrates this.


An interesting problem with free lists is how and where to allocate the LIST_ENTRY itself. At first, this appears to be a chicken and egg problem. However, since we are only using the free list to link known free blocks together, we can just use those same free blocks to store our LIST_ENTRY. In all of the figures throughout this chapter, we illustrate a LIST_ENTRY using a green rectangle. In the above figure, we have a single linked list with only a next pointer that points to LIST_ENTRY objects that are stored inside of the free blocks themselves.

To make things simple, it is helpful to store the LIST_ENTRY right at the start of every free block. This makes allocation and freeing much easier. Since all blocks are the same size, if a block is not entirely used, it can exhibit internal fragmentation. However allocating and freeing blocks is really fast and runs in constant time O(1).

Setup

We will need a memory buffer and a pointer to it. For example, let's create a buffer of 512 bytes that will store our allocation blocks, each of which will be 32 bytes. We will call _aligned_malloc throughout the examples in this chapter to allocate our blocks of memory so that the returned memory is always paged aligned. It is recommended to build the allocators in user mode to facilitate debugging. You can simply replace the call of _aligned_malloc to your own alloc_pages function later on. This function should allocate free physical pages and map them into the address space. You can also just set free_list to point to a heap store if you plan to use it as a heap allocator. In all cases, we need free memory to work with.

void* free_list = _aligned_malloc (512, PAGE_SIZE);
const int allocation_block_size = 32;

We then need to initialize the free list. Remember that each block is 32 bytes in this example, so we simply insert a LIST_ENTRY every 32 bytes to create the free list. Note how we set link->next of each entry to point to the next LIST_ENTRY.

LIST_ENTRY* link;
uint8_t* block = (uint8_t*)free_list;
for (int i = 0; i < allocation_block_size - 1; i++) {
   link = (LIST_ENTRY*) block;
   link->next = (LIST_ENTRY*) (block + allocation_block_size);
   block += allocation_block_size;
}
/* last entry should point to NULL. */
link = (LIST_ETRY*) block;
link->next = NULL;

The above code creates the free list. Note that each LIST_ENTRY is located right at the start of each allocation block (in this example, each allocation unit is 32 bytes.) We use the free allocation units themselves to store the linked list.

Allocation

Remove a block from the free list. Finally, return the block.

void* alloc () {

   void* allocation_unit;
   LIST_ENTRY* link;

   /* this is the first free allocation unit. */
   allocation_unit = free_list;
   link = (LIST_ENTRY*) free_list;
   free_list = link->next;

   return allocation_unit;
}

It is important to note that we are storing LIST_ENTRY at the beginning of each allocation unit. There is no reason why we can't instead store them at the end of the allocation units or somewhere in the middle. Depending on how you structure the free list, the code will be slightly different. The important thing though is to be consistent. You might also notice that our alloc has no size parameter. Since each returned allocation unit are all the same size, we deemed it unnecessary.

Freeing

Given a pointer to a block, simply reinsert it into the free list.

void free (void* memory) {

   void* allocation_unit;
   LIST_ENTRY* link;

   allocation_unit = memory;
   link = (LIST_ENTRY*) allocation_unit;
   link->next = (LIST_ENTRY*) free_list;
   free_list = allocation_unit;
}

Since all allocation units in the free list are, well free, by inserting the memory block back into the free list we have effectively made it re-allocatable. Since the memory is free, we can insert a new LIST_ENTRY at the start of the block which we can use to link it back into the free list.

Extending

We can extend the information we would like to store using CONTAINING_RECORD and a parent structure for LIST_ENTRY. For example,

typedef struct _BUFCTRL {
   int magic;
   int extra_information;
   LIST_ENTERY link;
}BUFCTRL;

We can add a similar structure at the end of each allocation unit to detect buffer overruns and corruption in internal memory structures. This is very important in kernel level code since we need to detect bugs in kernel code long before they can become catastrophic.

Free Stacks

Free Stacks are very similar to free lists. In fact, they can be implemented using free lists. The only difference is the interface. We can just implement a free list but write alloc and free a little differently. Allocating would be the same as popping an item from the stack, and freeing would be the same as pushing the item back on the stack. Since the free stack is implemented as a free list, popping an item off the stack just involves removing the first element of the list and pushing an item on the stack just involves inserting the item into the list.

Free lists and free stacks are used a lot in more advanced allocators and you may find code that inline these types of methods within more complicated code. So it is more helpful to understand the underlying theory behind why free lists are used and how they work so that you can recognize them in more complicated code.

Free Lists and Free Stacks as Physical Memory Allocators

Free Stacks have great use in physical memory allocators. With paging enabled, we can map any physical page to any location in the virtual address space. The actual location of the physical page does not matter in most cases; that is, when allocating we just need a page – any page. Even if we need contiguous memory, we can get any pages and map them contiguously in the address space later. Because the actual location of the page does not matter, free stacks and free lists can be a very fast and efficient way to allocate physical pages, where each allocation unit is a single page. Of course, this might present some problems in more advanced designs. For example, if you want to support different page sizes (not just 4096 bytes) or want to deal with hardware that does require contiguous physical memory. We can support these by introducing zones, which we will talk about in a later section.

Also note that, because we store the LIST_ENTRY for free lists or free stacks directly in the free physical pages, this presents interesting problems with designs that support paging. That is, we can't simply write or read from the memory as we did in the above code samples. They need to be mapped into the address space first. This complicates our alloc and free code slightly.

void* alloc () {

   void* allocation_unit;
   LIST_ENTRY* link;

   /* this is the first free allocation unit. */
   allocation_unit = free_list;

   /* map it into address space. */
   allocation_unit = map (allocation_unit);

   link = (LIST_ENTRY*) free_list;
   free_list = link->next;
   return allocation_unit;
}
void free (void* memory) {

   void* allocation_unit;
   LIST_ENTRY* link;

   /* this becomes the top of our stack. */
   allocation_unit = memory;
   link = (LIST_ENTRY*) allocation_unit;
   link->next = (LIST_ENTRY*) free_list;

   /* unmap page. */
   unmap (allocation_unit);

   /* store physical address and unmap it. */
   free_list = get_physical_address (allocation_unit);
}

In the above example, alloc gets a physical address from the free_list. This is the first allocation unit in the list and is what we want to return. But we need to first update free_list to point to the next entry, which means we need to get (LIST_ENTRY*)allocation_unit. But this is a physical address not a virtual address, so we need to call map to map it into the address space first. After mapping it, allocation_unit points to the virtual address of the page we want and we can now read link->next from it. We then returned the virtual address to the page we just mapped.

To free, note that memory is a virtual address. Because it is already mapped, we can just write a new LIST_ENTRY into that page as we did above. Note that because free_list already stores a physical address, the link->next line is fine. However, we need to update free_list to point to the physical address (not virtual) of memory. So we call a new function, get_physical_address, to give us the physical address. We can then unmap the memory page since we no longer need it.

The function get_physical_address can work by getting the page table index and directory table index from the virtual address itself (recall the virtual address format.) It can then look through the page tables to find the Page Table Entry (PTE)->frame field which will be the mapped physical address. The functions map and unmap just maps and unmaps a physical address.

Bitmaps

A bitmap is an array of single bits, where each bit represents the state of a respective allocation unit. Allocation units can either be in use (has been allocated) or free. Given these two states, we can use a single bit to represent the state of the allocation unit. We first introduced bitmaps in our physical memory management chapter however they are a fundamental part of a number of advanced allocators just like free lists and free stacks.


Setup

The bitmap requires some additional space to store the bitmap itself. It also needs some memory to work with to store the allocation units in. You can also store the bitmap within the free memory area but need to be careful when reserving space. For our example code, we will call __aligned_malloc again to allocate free memory that is paged aligned and we will use the entire page for our free memory. This is where we will allocate from.

void* memory_start = __aligned_malloc(PAGE_SIZE,PAGE_SIZE);

Allocating

To allocate, scan all bits in the bitmap to find a free allocation unit and return it. Due to the scan, this operation has a worst case of O(n) and can be much slower then free lists and free stacks.

void* alloc() {

   index_t bit;

   for (bit = 0; bit < allocation_unit_count; n++)
      if (bit_test (bit) == 0)
         break;

   if (bit == allocation_unit_count)
      return NULL;

   return (void*) (mamory_start + (bit * sizeof (allocation_unit)));
}

In the above, we loop through all bits in the bitmap. We call bit_test which returns the value of the bit inside of the bitmap. If it is 0, we have found a free allocation unit. If we don't find anything, we return NULL. If we have found one, then bitmap [bit] == 1 and bit is the bit number in the bitmap. This is also the allocation unit number. So the allocation unit is at memory_start + (bit * sizeof(allocation_unit).

Freeing

To free, simply set the bit representing the state of the allocation unit.

void free(void* memory) {

   index_t bit;

   bit = size_of_memory / sizeof(allocation_unit).
   set_bit (bit);
}

An interesting property of bitmaps is that, unlike free lists and free stacks, can detect multiple free attempts of the same pointer. All we need to do is check to see if the bit was already set. Compare our examples above with our full implementation in our physical memory manager chapter.

3. Kernel Allocators

Many memory management systems use multiple allocation techniques. We call these types of allocators a hybrid. For example, the buddy allocator makes heavy use of either the bitmap or free list techniques discussed in the previous section. The slab allocator also makes heavy use of lists of free lists. This section covers some hybrid allocators, beginning with the buddy allocator.

Buddy Allocator

The Buddy Allocator is the first hybrid allocator that we will look at and is also one of the simplest. In order to see how it works, let's first take a single 256 byte buffer of memory and divide it by two. This gives us two 128 byte smaller buffers. If we divide them by two again, we get four 64 byte buffers. Finally, we will divide this by two one more time to get 8 32 byte buffers. We will also number each block. Please see the following figure.



Each time we divide a block into two smaller blocks (for example, we divided block 0 to get block 1 and block 2) we will call these smaller blocks
buddies of each other. So, for example, block 1 and block 2 are buddies. Blocks 3 and 4 are buddies. Blocks 5 and 6 are buddies. But blocks 4 and 5 are not buddies. Given any block, we can find its buddy by looking at its block number as shown above. By finding the buddy, we can combine the blocks back together. So, for example, if we divided block 1 (128 bytes) into block 3 and block 4 (both 64 bytes), blocks 3 and 4 are buddies. Knowing this, we can combine block 3 and block 4 back into block 1. In other words, we can divide a larger block into two and also combine those two smaller blocks back into that larger block.

You might have noticed that in the above all we did was keep dividing the buffer by two. This gives us some interesting properties:

Blocks Per Level = 2^level
Size of Blocks at Level = Total Buffer Size / Blocks Per Level
Index in Level Of Pointer = Pointer / Size of Blocks at Level

For example, the number of blocks in level 3 is 2^3 = 8. The size of the blocks at that same level is 256 / 8 = 32 bytes. For the last example, let's say we had a pointer at byte 96 in the buffer. Then we can get the index of that pointer in level 3 by doing 96 / 32 = 3. This is the third block in level 3, which is
block 9 in the above figure. What this means is that, given a level and a pointer, we can compute the block number which we need to find its buddy. We also know what level to use based on the allocation size. For example, if we know the pointer points to a 2^3 byte block, we know to use level 3. This farther means that we only need a pointer and the allocation size to find its buddy.

This is all good and well, but how can we go about implementing something like this and using it for allocations? Instead of looking at it like a memory buffer, let's instead look at it as a list of free lists. One free list for each level like in the following figure. Recall that because these are free lists, they would only link free blocks.



By using free lists in this manner, we simplify allocation greatly. The above figure illustrates a system with no more 256 byte blocks, one 128 byte block, and two 2^n byte blocks that are free for use. If we wanted to allocate a new 128 byte block, just remove one from level 1. If we wanted to free a 2^n byte block, just add it back to the level n list. Since these are free lists, we can store the LIST_ENTRY directly within the free blocks themselves.

Setup

Let us define some macros first. These implement the basic properties we discussed earlier.

#define MAX_LEVELS 32 /* 2^32 = 4GB. */
#define BLOCKS_PER_LEVEL(level) (1<<(level))
#define SIZE_OF_BLOCKS_AT_LEVEL(level,total_size) ((total_size) / (1<<(level))
#define INDEX_OF_POINTER_IN_LEVEL(pointer,level,total_size) \
   ((pointer) / (SIZE_OF_BLOCKS_AT_LEVEL(level,total_size)))

The macro INDEX_OF_POINTER_IN_LEVEL is incomplete. The actual pointer we need to pass it must be relative to beginning of our memory buffer. That is, we need to use pointer – memory_start inside of our calculation, so the macro becomes:

/* Corrected. */
#define INDEX_OF_POINTER_IN_LEVEL(pointer,level,memory_start,total_size) \
   (((pointer)-(memory_start)) / (SIZE_OF_BLOCKS_AT_LEVEL(level,total_size)))


This macro is a little complicated, but if you compare it with our original calculation in the previous section, it is nothing more then
Pointer / Size of Blocks at Level. This will be very important later on when freeing.

We also need a memory buffer to work with. Like with our other examples, we will just call
__aligned_malloc. This gives us a PAGE_SIZE buffer that is paged aligned.

void* memory_start = __aligned_malloc(PAGE_SIZE,PAGE_SIZE);

If you plan to use the buddy allocator for physical memory allocation, memory_start would either be 0 (base of memory) or the beginning of a memory zone (which we will discuss later.) This works because only free memory blocks will be on the free lists. ROM, device RAM, memory holes, etc wouldn't be in it. Finally, we need our free lists.

void* free_lists[MAX_LEVELS];

We then need to actually build the free lists. If this is for a physical memory manager, we would need to scan the memory map to build the free lists. For our example above, we only have one free memory area which is PAGE_SIZE bytes, so we only need to set one entry.

void setup() {
   LIST_ENTRY* link;
   index_t i;

   for(i = 0; i < MAX_LEVELS; i++)
      InitializeListHead(&free_lists[i]);

   link = (LIST_ENTRY*)memory_start;
   InitializeListHead(link);
   InsertTailList(&free_lists[0], link);
}

The first thing we do is loop through all free lists and initialize them. We then write a LIST_ENTRY to memory_start. This is the same trick we used earlier when we introduced free lists – we write the LIST_ENTRY directly at the start of each block. We then initialize that entry and add it to free_lists[0]. Now free_lists[0] points to one LIST_ENTRY which is at the beginning of memory_start.

free_lists[0] is special. It is level 0 in all of the figures presented so far. All figures at level 0 has one thing in common – the block size is the entire buffer size. In the above example, because we allocated a buffer of PAGE_SIZE bytes, free_lists[0] points to one free block that is PAGE_SIZE bytes. All of the other free lists are empty. This means we only have one free PAGE_SIZE block and nothing else. This is correctsince we didn't allocate anything yet, we have a PAGE_SIZE block of free memory!

Allocation

Alright, so the idea is that we keep on dividing the blocks by two until we find the largest allocation size that we need. For example, let's say that we need to allocate 32 bytes. This is 2^4 so we should take the 32 byte block from level 4-1=3 (recall that we started with level 0.) So our allocation function needs to take our buffer and divide it by two until we get a 32 byte block. Let us also assume a 256 byte memory buffer. Please see the following figure.


In the above figure, we allocate the first 32 byte block from the buffer. To do this, we took the 256 byte buffer and divided it into two 128 byte blocks. We then divide the first 128 byte block into two smaller 64 byte blocks. We then divide the first 64 byte block into two 32 byte blocks. Note how blocks 1 and 2 are buddies, blocks 3 and 4 are buddies and blocks 7 and 8 are buddies. We grayed the blocks that have been divided. Block 7 is then returned from the allocation function.

But keep in mind that these divisions are happening on the
same 256 byte buffer. The above figure only visualizes how the buffer gets divided. The actual buffer looks like the following figure.


Compare these two images to see how the algorithm divides the buffer. Notice the first 32 byte block (block 7) is now allocated. Because we divided block 3, we ended up with a left-over 32 byte block (block 8). Because we divided block 1, we ended up with a left over 64 byte block (block 4). And because we divided block 0, we ended up with a left over 128 byte block. These left over blocks are free for use the next time we need to allocate more memory. Our free lists reflects this below.


Compare the resulting free list to the preceding two figures. Because we have no more 256 byte blocks to allocate, the first list is empty. However, we do have one 64 byte block and one 32 byte block that can be allocated. If we allocated another 32 byte block later on, we could just take block 8. If we needed to allocate two 64 byte blocks, we can allocate block 4 and then divide block 2 into blocks 5 and 6 and then return block 5 (these blocks are shown in the figure at the beginning of the section.)

The dividing operation is recursive and so we can write the algorithm in this way. So, let's start with the following.

void* alloc(size_t size) {
   index_t level = get_level(size);
   return _alloc(level);
}


Here, get_level just returns the zero based level number given a size. That is, if the size was 32 bytes, the level is 3 because 256/2/2 = 32. We then call _alloc which will be our recursive function.

void* _alloc(index_t level) {

   void* memory;
   LIST_ENTRY* left;
   LIST_ENTRY* right;
   LIST_ENTRY* link;

   if (IsListEmpty(&free_lists[level]) {

   }

   link = RemoveHeadList(&free_lists[level]);
   memory = (void*) link;
   return memory;
}


So, we make sure the free list at the level has something inside of it. If it does at this level, then we can simply allocate from that free list. If it is empty, then we have a bit more work to do. We need to add a new block to the free list at this level so that it does have something in it. But where can this block come from? Look back at the free list figure above. Let's say we need a “32 byte block” but the “32 byte block free list” is empty. We created these 32 byte blocks by dividing larger blocks. If we can grab a bigger block from a lower level (let's say, the “64 byte block free list”) then we can split that block into two new blocks on the “32 byte block free list.” And if the “64 byte block free list” was also empty when we tried to do it, then we can just try the “128 byte block” and divide that. We can keep going to higher and higher levels as needed until we can get a block that can be split in two. This is the recursive step.

We can allocate a bigger block from a lower level free list by calling _alloc again. If it is successful, we can split it into two blocks by adding two and insert them both back into the list. The final code is below.

void* _alloc(index_t level) {

   uint8_t* memory;
   LIST_ENTRY* left;
   LIST_ENTRY* right;
   LIST_ENTRY* link;

   if (IsListEmpty(&free_lists[level]) {
      size_t size;

      memory = _alloc(level-1);
      if (!memory)
         return NULL; /* out of memory. */

      /* recall that entire memory size was PAGE_SIZE. In the figures,
      we used a memory buffer of 512 bytes to keep them small. */
      size =  SIZE_OF_BLOCKS_AT_LEVEL(level,PAGE_SIZE);

      /* now we split this block into two. */
      left = (LIST_ENTRY*) memory;
      right = (LIST_ENTRY*) (memory+size);

      /* initialize them. */
      InitializeListHead(left);
      InitializeListHead(right);

      /* insert the two new blocks. */
      InsertTailList(&free_lists[level], left);
      InsertTailList(&free_lists[level], right);
   }

   link = RemoveHeadList(&free_lists[level]);
   memory = (uint8_t*) link;
   return (void*) memory;
}

Freeing

Given a pointer and a size, we can start like we did above.

void free(void* memory, size_t size) {
   index_t level = get_level(size);
   return _free(memory, level);
}


We did this because this operation is also
recursive. Think back to the way we are dividing blocks in two. We can combine those same two blocks to a larger block when both of them are free. We can then combine those larger blocks into an even larger block, and so on. This is the recursive step. For example, we can combine block 7 and 8 back into block 3. We can also combine block 3 and block 4 back into block 1.

Freeing is slightly more complicated then allocating so let's start like this.

void _free(void* memory, index_t level) {

   LIST_ENTRY* link;
   size_t size = size_from_level(level);

   link = (LIST_ENTRY*) memory;
   InitializeListHead(link);
   InsertTailList(&free_lists[level], link);
}


This is a good start. We add a LIST_ENTRY to the memory so that we can link it back into the free list. But how do we combine blocks back together? We need to find this blocks
buddy. Recall that the way we do this is to look at the block index number. Our code now looks like the following.

void _free(void* memory, index_t level) {

   LIST_ENTRY* link;
   index_t index;
   addr_t buddy;
   size_t size = size_from_level(level);

   index = INDEX_OF_POINTER_IN_LEVEL(memory,memory_start,PAGE_SIZE);

   if (index & 1) == 0) buddy = (addr_t) memory + size;
   else                 buddy = (addr_t) memory – size;

   link = (LIST_ENTRY*) memory;
   InitializeListHead(link);
   InsertTailList(&free_lists[level], link);
}

Now that we have found the buddy, we need to check if it is also free. If it is still allocated, we can't just combine them as one big free block. Since these blocks are buddies, they have the same size and the same free list level. So we can quickly check it by scanning the free list at free_list [level] to find buddy. Our code now becomes the following.

void _free(void* memory, index_t level) {

   LIST_ENTRY* link;
   LIST_ENTRY* buddy_link;
   index_t index;
   addr_t buddy;
   size_t size = size_from_level(level);

   index = INDEX_OF_POINTER_IN_LEVEL(memory,memory_start,PAGE_SIZE);

   if (index & 1) == 0) buddy = (addr_t) memory + size;
   else                 buddy = (addr_t) memory – size;

   buddy_link = NULL;
   if (! ListEmpty(&free_lists[level]))
      buddy_link = free_list_find(buddy);

   link = (LIST_ENTRY*) memory;
   InitializeListHead(link);
   InsertTailList(&free_lists[level], link);

   if ( buddy_link == buddy) {
      /* both blocks are on free list. */
   }
}

We can now go ahead and actually merge the blocks. This is done by removing the current block as well as the buddy from the current list at free_list [leve]. We then insert it into free_list [level-1] as one free block. However, we might also be able to free this larger block as well, we don't yet know. So instead of just inserting the block, we will recursively try to free the larger block until we cannot free anymore. The following is the final code.

void _free(void* memory, index_t level) {

   LIST_ENTRY* link;
   LIST_ENTRY* buddy_link;
   index_t index;
   addr_t buddy;
   size_t size = size_from_level(level);

   index = INDEX_OF_POINTER_IN_LEVEL(memory,memory_start,PAGE_SIZE);

   if (index & 1) == 0) buddy = (addr_t) memory + size;
   else                 buddy = (addr_t) memory – size;

   buddy_link = NULL;
   if (! ListEmpty(&free_lists[level]))
      buddy_link = free_list_find(buddy);

   link = (LIST_ENTRY*) memory;
   InitializeListHead(link);
   InsertTailList(&free_lists[level], link);

   if ( buddy_link == buddy) {
      RemoveListEntry(link);
      RemoveListEntry(buddy_link);
      if (index & 1) == 0)
         _free(link, level - 1);
      else
         _free(buddy_link, level - 1);
   }
}

Using Bitmaps instead of Free lists

You can certainly use bitmaps instead of free lists for the implementation of the buddy allocator. Since the free lists only provide us the “free” blocks, we can mark them as “0” and all other blocks in a given level as “in use” and give them a “1”. Each level would have its own bitmap that gives us the state of every block at that level. The code will simplify slightly however we would also introduce a small performance penalty during allocations and frees. Try to implement the above using a bitmap for an exercise if it helps.

Slab Allocator

This last allocator is the most complex allocator that we will introduce here. It makes heavy use of free lists, even more-so then the buddy allocator. It is a kernel object caching allocator and the general allocator used in Mach and Linux like operating systems. The original design supported kernel object caching and construction. It was later revised and updated to support per-CPU caches to improve performance on multiprocessor systems, and cache coloring to improve cache usage. We will cover the original design here. The revised design adds another whole layer on top the original design and will just make things more complex. The slab allocator tends to be about 1,000 lines or more depending on the architecture of your design. Because of this, we will simplify the code for presentation purposes in the text.

We will first start by introducing the data structures that we will need. We will discuss them farther next.

typedef struct _SLAB {
   CACHE*          cache;
   LIST_ENTRY      listEntry;
   size_t          objectCount;
   size_t          usedObjects;
   size_t          bufferSize;
   union {
      void*         freeList;
      LIST_ENTRY    bufferControlFreeListHead;
   }u;
}SLAB;

typedef struct _BUFCTRL {
   void* buffer;
   SLAB* parent;
   LIST_ENTRY entry;
}BUFCTRL;

typedef struct _CACHE {
   size_t            size;
   int               align;
   int               flags;
   SPIN_LOCK         lock;
   LIST_ENTRY        fullSlabListHead;
   LIST_ENTRY        partialSlabListHead;
   LIST_ENTRY        emptySlabListHead;
   LIST_ENTRY        listEntry;
}CACHE;

If we were also going to implement the revised and updated version, we would also have introduced a CPUCACHE structure. These structures might look daunting at first (caches, lists within lists, lists of free lists, BUFCTRL, and others). Let's look at the design structure and hopefully we'll get through these structures and what they are for.

Design

Oh boy...

4. General Allocators

You might have realized that we have not once mentioned the standard malloc and free. Everything we covered so far can be used as a basis to implement heaps in kernel and user space. Topics such as free lists, bitmaps, combing blocks, caching, are fundamental in general allocators that need to handle memory of arbitrary sizes. The slab allocator is an example of a real kernel heap allocator and is used by several operating systems. This is why we decided to cover the slab allocator. However, user space allocators must be able to handle arbitrary allocation sizes. General allocators have their own set of problems and design issues that must be considered when implementing.

Of course, these general purpose algorithms can also be applied in kernel mode to implement a simple kernel heap. If you need a simple allocator and do not have the need for anything as complex as the slab allocator, going with a best fit or first fit might be a better option.

Free Lists – Best Fit and First Fit



Binning



5. Conclusion



Until next time,

~Mike ();

OS Development Series Editor



Home