|
Operating System Development Series |
This series is intended to demonstrate and teach operating system development from the ground up.
26: Memory Allocation by Mike, 2016 |
“Controlling complexity is the essence of computer programming“ -Brian W. Kernighan |
Welcome!
In
the previous chapter, we built a working multitasking operating
system with tasks running as kernel mode processes. Throughout this
entire series we have been avoiding a topic that we perhaps should
have introduced earlier. Unfortunately, we cannot go much farther
without it – memory allocation.
We have also been too lenient on other topics, such as page
swapping, recursive
page directories, and
address space management.
With a proper memory allocator, we can implement a kernel
heap. With a well design,
we can also extend the physical memory manager even farther to
improve page allocation and zone allocations. The topics on virtual
memory are equally important. There is a big
problem with the current virtual
memory manager that will quickly reveal itself if we remove the
identity mapping. We will address this problem later when we
introduce recursive page directories.
Finally, we need a nice way to manage a process virtual
address space and
working set list.
These will allow us to map what we need properly and allow us to
support multi-threaded user mode tasks.
We will first dive
into allocation techniques. Some of which we have introduced earlier,
but will preset them a little differently here. These include the
linear allocator,
free list,
free stack,
and bitmap.
We will then look at two more advanced allocators – the buddy
allocator and
slab allocator.
The slab allocator is
the most complex allocator presented in this chapter.
We will
then dive into allocator design.
We will look at building kernel heap pools,
physical page allocation,
virtual page allocation,
and how everything works together.
In the next chapter, we
will dive into address space management issues, including page
swapping,
page files,
recursive page directories,
and address space structures.
The first type of
allocator that we want to present is a linear allocator.
This is the simplest type of allocator to implement. Allocations are
very fast and run in constant time. It does not support freeing
individual allocated blocks. Instead, the entire buffer
gets freed when you are done with all of it.
For example,
assume start points to the beginning of the buffer and end
points to the end of the buffer. Farther, assume Current
Offset stores the current byte offset from the start of the
buffer. If we want to allocate n bytes from this
buffer, all we need to do is return a pointer to the current location
and update Current Offset += n.
Translating
into code we get the following.
uint8_t* _currentOffset; void* _memoryEnd; void* _memoryStart; void* alloc(size_t size, align_t align) { void* memory; if (_currentOffset >= _memoryEnd) return NULL; _currentOffset = alignup( currentOffset,align); memory = _currentOffset; _currentOffset += size; }
The Free List
is a linked list of free blocks of memory. Typically we would divide
the memory into even sized blocks of n bytes. We can
then build a linked list comprising of these free blocks inside of
the buffer. If a block is free, it is in the list. If it is not free,
we don't need to worry about managing it.
So we need a linked
list that needs to be compatible with any type of block in the free
list. We don't know nor care about the data in these blocks. But we
do still need to store the list pointers somehow. So, we will
introduce a new structure, LIST_ENTRY as follows.
typedef struct _LIST_ENTRY { struct LIST_ENTRY* next; struct LIST_ENTRY* prev; }LIST_ENTRY;
Using LIST_ENTRY, we can store the forward and back links needed to
construct the linked list anywhere we want. We can maintain a linked
list only – which is what we want.
The data
structure above though has some interesting properties. If we were to
use it in some structure, we can get that structure given only a
LIST_ENTRY pointer and the address of the LIST_ENTRY in the parent
structure. For example,
typedef struct _FOO { int data1; int data2; LIST_ENTRY listEntry; int data3; }FOO; FOO bar;
Then &bar would give us the base location of the FOO
object. We also know &bar->listEntry. We also know the
offset of bar->listEntry. The offset is
&((FOO*)0)->listEntry. So &bar->listEntry -
&((FOO*)0)->listEntry points to the beginning of the
parent structure FOO. This is how the CONTAINING_RECORD and
OFFSETOF macros work. We can use
CONTAINING_RECORD to
get the parent structure of a LIST_ENTRY.
Okay,
so a free list is just a linked list of free blocks in memory. The
following figure illustrates this.
An
interesting problem with free lists is how and where to allocate the
LIST_ENTRY itself. At first, this appears to be a chicken and
egg problem. However, since we are only using the free list to link
known free blocks together, we can just use those same free blocks to
store our LIST_ENTRY. In all of the figures throughout this
chapter, we illustrate a LIST_ENTRY using a green rectangle.
In the above figure, we have a single linked list with only a next
pointer that points to LIST_ENTRY objects that are stored
inside of the free blocks themselves.
To make things simple,
it is helpful to store the LIST_ENTRY right at the start of
every free block. This makes allocation and freeing much easier.
Since all blocks are the same size, if a block is not entirely used,
it can exhibit internal fragmentation. However
allocating and freeing blocks is really fast and runs in constant
time O(1).
Setup
We
will need a memory buffer and a pointer to it. For example, let's
create a buffer of 512 bytes that will store our allocation blocks,
each of which will be 32 bytes. We will call _aligned_malloc
throughout the examples in this
chapter to allocate our blocks of memory so that the returned memory
is always paged aligned. It is recommended to build the allocators in
user mode to facilitate debugging. You can simply replace the call of
_aligned_malloc to
your own alloc_pages function
later on. This function should allocate free physical pages and map
them into the address space. You can also just set free_list
to point to a heap store if you
plan to use it as a heap allocator. In all cases, we need free memory
to work with.
void* free_list = _aligned_malloc (512, PAGE_SIZE); const int allocation_block_size = 32;
We then need to initialize the free list. Remember that each block is 32 bytes in this example, so we simply insert a LIST_ENTRY every 32 bytes to create the free list. Note how we set link->next of each entry to point to the next LIST_ENTRY.
LIST_ENTRY* link; uint8_t* block = (uint8_t*)free_list; for (int i = 0; i < allocation_block_size - 1; i++) { link = (LIST_ENTRY*) block; link->next = (LIST_ENTRY*) (block + allocation_block_size); block += allocation_block_size; } /* last entry should point to NULL. */ link = (LIST_ETRY*) block; link->next = NULL;
The above code creates the free list. Note that each LIST_ENTRY is
located right at the start of each allocation block (in this example,
each allocation unit is 32 bytes.) We use the free allocation units
themselves to store the linked list.
Allocation
Remove
a block from the free list. Finally, return the block.
void* alloc () { void* allocation_unit; LIST_ENTRY* link; /* this is the first free allocation unit. */ allocation_unit = free_list; link = (LIST_ENTRY*) free_list; free_list = link->next; return allocation_unit; }
It is important to note that we are storing LIST_ENTRY at the
beginning of each allocation unit. There is no reason why we can't
instead store them at the end of the allocation units or somewhere in
the middle. Depending on how you structure the free list, the code
will be slightly different. The important thing though is to be
consistent. You might also notice that our alloc has no
size parameter. Since each returned allocation unit are all the same
size, we deemed it unnecessary.
Freeing
Given a
pointer to a block, simply reinsert it into the free list.
void free (void* memory) { void* allocation_unit; LIST_ENTRY* link; allocation_unit = memory; link = (LIST_ENTRY*) allocation_unit; link->next = (LIST_ENTRY*) free_list; free_list = allocation_unit; }
Since all allocation units in the free list are, well free, by
inserting the memory block back into the free list we have
effectively made it re-allocatable. Since the memory is free, we can
insert a new LIST_ENTRY at the start of the block which we can use to
link it back into the free list.
Extending
We
can extend the information we would like to store using
CONTAINING_RECORD and a parent structure for LIST_ENTRY. For example,
typedef struct _BUFCTRL { int magic; int extra_information; LIST_ENTERY link; }BUFCTRL;
We can add a similar structure at the end of each allocation unit to detect buffer overruns and corruption in internal memory structures. This is very important in kernel level code since we need to detect bugs in kernel code long before they can become catastrophic.
Free Stacks are very similar to free lists. In fact, they can be implemented using free lists. The only difference is the interface. We can just implement a free list but write alloc and free a little differently. Allocating would be the same as popping an item from the stack, and freeing would be the same as pushing the item back on the stack. Since the free stack is implemented as a free list, popping an item off the stack just involves removing the first element of the list and pushing an item on the stack just involves inserting the item into the list.
Free lists and free
stacks are used a lot in more advanced allocators and you may find
code that inline these types of methods within more
complicated code. So it is more helpful to understand the underlying
theory behind why free lists are used and how they work so that you
can recognize them in more complicated code.
Free Lists and
Free Stacks as Physical Memory Allocators
Free Stacks have
great use in physical memory allocators. With paging
enabled, we can map any physical page to any location in the virtual
address space. The actual location of the physical page does not
matter in most cases; that is, when allocating we just need a
page – any page. Even if we need contiguous
memory, we can get any pages and map them contiguously
in the address space later. Because the actual location of the page
does not matter, free stacks and free lists can be a very fast and
efficient way to allocate physical pages, where each allocation unit
is a single page. Of course, this might present some problems in more
advanced designs. For example, if you want to support different page
sizes (not just 4096 bytes) or want to deal with hardware that does
require contiguous physical memory. We can support these by
introducing zones, which we will talk about in a later
section.
Also note that, because we store the LIST_ENTRY for
free lists or free stacks directly in the free physical pages, this
presents interesting problems with designs that support paging. That
is, we can't simply write or read from the memory as we did in the
above code samples. They need to be mapped into the
address space first. This complicates our alloc and free
code slightly.
void* alloc () { void* allocation_unit; LIST_ENTRY* link; /* this is the first free allocation unit. */ allocation_unit = free_list; /* map it into address space. */ allocation_unit = map (allocation_unit); link = (LIST_ENTRY*) free_list; free_list = link->next; return allocation_unit; } void free (void* memory) { void* allocation_unit; LIST_ENTRY* link; /* this becomes the top of our stack. */ allocation_unit = memory; link = (LIST_ENTRY*) allocation_unit; link->next = (LIST_ENTRY*) free_list; /* unmap page. */ unmap (allocation_unit); /* store physical address and unmap it. */ free_list = get_physical_address (allocation_unit); }
In the above example, alloc gets a physical address from the
free_list. This is the first allocation unit in the list and
is what we want to return. But we need to first update free_list to
point to the next entry, which means we need to get
(LIST_ENTRY*)allocation_unit. But this is a physical address
not a virtual address, so we need to call map to map it into
the address space first. After mapping it, allocation_unit points to
the virtual address of the page we want and we can now read
link->next from it. We then returned the virtual address to
the page we just mapped.
To free, note that memory
is a virtual address. Because it is already mapped, we can just write
a new LIST_ENTRY into that page as we did above. Note that because
free_list already stores a physical address, the link->next
line is fine. However, we need to update free_list to point to
the physical address (not virtual) of memory. So we
call a new function, get_physical_address,
to give us the physical address. We can then unmap the memory
page since we no longer need it.
The function
get_physical_address can
work by getting the page table index and directory table index from
the virtual address itself (recall the virtual address format.) It
can then look through the page tables to find the Page
Table Entry (PTE)->frame field
which will be the mapped physical address. The functions map
and unmap just
maps and unmaps a physical address.
A bitmap is an array of single bits, where each bit represents the state of a respective allocation unit. Allocation units can either be in use (has been allocated) or free. Given these two states, we can use a single bit to represent the state of the allocation unit. We first introduced bitmaps in our physical memory management chapter however they are a fundamental part of a number of advanced allocators just like free lists and free stacks.
Setup
The
bitmap requires some additional space to store the bitmap itself. It
also needs some memory to work with to store the allocation units in.
You can also store the bitmap within the free memory area but need to
be careful when reserving space. For our example code, we will call
__aligned_malloc again to allocate free memory that is paged aligned
and we will use the entire page for our free memory. This is where we
will allocate from.
void* memory_start = __aligned_malloc(PAGE_SIZE,PAGE_SIZE);
Allocating
To allocate, scan all bits in the bitmap to
find a free allocation unit and return it. Due to the scan, this
operation has a worst case of O(n) and can be much slower then free
lists and free stacks.
void* alloc() { index_t bit; for (bit = 0; bit < allocation_unit_count; n++) if (bit_test (bit) == 0) break; if (bit == allocation_unit_count) return NULL; return (void*) (mamory_start + (bit * sizeof (allocation_unit))); }
In the above, we loop through all bits in the bitmap. We call
bit_test which returns the value of the bit inside of the
bitmap. If it is 0, we have found a free allocation
unit. If we don't find anything, we return NULL. If we have found
one, then bitmap [bit] == 1 and bit is the bit number in the bitmap.
This is also the allocation unit number. So the allocation unit is at
memory_start + (bit * sizeof(allocation_unit).
Freeing
To
free, simply set the bit representing the state of the allocation
unit.
void free(void* memory) { index_t bit; bit = size_of_memory / sizeof(allocation_unit). set_bit (bit); }
An interesting property of bitmaps is that, unlike free lists and free stacks, can detect multiple free attempts of the same pointer. All we need to do is check to see if the bit was already set. Compare our examples above with our full implementation in our physical memory manager chapter.
Many memory management systems use multiple allocation techniques. We call these types of allocators a hybrid. For example, the buddy allocator makes heavy use of either the bitmap or free list techniques discussed in the previous section. The slab allocator also makes heavy use of lists of free lists. This section covers some hybrid allocators, beginning with the buddy allocator.
The Buddy Allocator is the first hybrid allocator that we will look at and is also one of the simplest. In order to see how it works, let's first take a single 256 byte buffer of memory and divide it by two. This gives us two 128 byte smaller buffers. If we divide them by two again, we get four 64 byte buffers. Finally, we will divide this by two one more time to get 8 32 byte buffers. We will also number each block. Please see the following figure.
Each
time we divide a block into two smaller blocks (for example, we
divided block 0 to get block 1 and block 2) we will call these
smaller blocks buddies of
each other. So, for example, block 1 and block 2 are buddies.
Blocks 3 and 4 are buddies.
Blocks 5 and 6 are buddies.
But blocks 4 and 5 are not buddies.
Given any block, we can find its buddy by looking at its block number
as shown above. By finding the buddy, we can combine
the blocks back together. So, for
example, if we divided block 1 (128 bytes) into block 3 and block 4
(both 64 bytes), blocks 3 and 4 are buddies.
Knowing this, we can combine block
3 and block 4 back into block 1. In other words, we can divide a
larger block into two and also combine those two smaller blocks back
into that larger block.
You might have noticed that in the
above all we did was keep dividing the buffer by two. This gives us
some interesting properties:
Blocks Per Level =
2^level
Size of Blocks at Level =
Total Buffer Size / Blocks Per Level
Index in Level Of
Pointer = Pointer / Size of
Blocks at Level
For example, the number of blocks in level 3
is 2^3 = 8. The size of the blocks at that same level is 256 / 8 = 32
bytes. For the last example, let's say we had a pointer at byte 96 in
the buffer. Then we can get the index of that pointer in level 3 by
doing 96 / 32 = 3. This is the third block in level 3, which is block
9 in the above figure. What
this means is that, given a level and a pointer, we can
compute the block number which
we need to find its buddy. We also know what level to use based on
the allocation size.
For example, if we know the pointer points to a 2^3 byte block, we
know to use level 3. This farther means that we only
need a pointer and the allocation size to find its buddy.
This
is all good and well, but how can we go about implementing something
like this and using it for allocations? Instead of looking at it like
a memory buffer, let's instead look at it as a list of free lists.
One free list for each level like in the following figure. Recall
that because these are free lists, they would only link free blocks.
By
using free lists in this manner, we simplify allocation greatly. The
above figure illustrates a system with no more 256 byte blocks, one
128 byte block, and two 2^n byte blocks that are free for use. If we
wanted to allocate a new 128 byte block, just remove one from level
1. If we wanted to free a 2^n byte block, just add it back to the
level n list. Since these are free lists, we can store the LIST_ENTRY
directly within the free blocks themselves.
Setup
Let
us define some macros first. These implement the basic properties we
discussed earlier.
#define MAX_LEVELS 32 /* 2^32 = 4GB. */
#define BLOCKS_PER_LEVEL(level) (1<<(level))
#define SIZE_OF_BLOCKS_AT_LEVEL(level,total_size) ((total_size) / (1<<(level))
#define INDEX_OF_POINTER_IN_LEVEL(pointer,level,total_size) \
((pointer) / (SIZE_OF_BLOCKS_AT_LEVEL(level,total_size)))
The macro INDEX_OF_POINTER_IN_LEVEL is incomplete. The actual pointer we need to pass it must be relative to beginning of our memory buffer. That is, we need to use pointer – memory_start inside of our calculation, so the macro becomes:
/* Corrected. */
#define INDEX_OF_POINTER_IN_LEVEL(pointer,level,memory_start,total_size) \
(((pointer)-(memory_start)) / (SIZE_OF_BLOCKS_AT_LEVEL(level,total_size)))
This
macro is a little complicated, but if you compare it with our
original calculation in the previous section, it is nothing more then
Pointer /
Size of Blocks at Level.
This will be very important later on when freeing.
We also
need a memory buffer to work with. Like with our other examples, we
will just call __aligned_malloc.
This gives us a PAGE_SIZE buffer that is paged aligned.
void* memory_start = __aligned_malloc(PAGE_SIZE,PAGE_SIZE);
If you plan to use the buddy allocator for physical memory allocation, memory_start would either be 0 (base of memory) or the beginning of a memory zone (which we will discuss later.) This works because only free memory blocks will be on the free lists. ROM, device RAM, memory holes, etc wouldn't be in it. Finally, we need our free lists.
void* free_lists[MAX_LEVELS];
We then need to actually build the free lists. If this is for a physical memory manager, we would need to scan the memory map to build the free lists. For our example above, we only have one free memory area which is PAGE_SIZE bytes, so we only need to set one entry.
void setup() { LIST_ENTRY* link; index_t i; for(i = 0; i < MAX_LEVELS; i++) InitializeListHead(&free_lists[i]); link = (LIST_ENTRY*)memory_start; InitializeListHead(link); InsertTailList(&free_lists[0], link); }
The
first thing we do is loop through all free lists and initialize them.
We then write a LIST_ENTRY
to memory_start.
This is the same trick we used earlier when we introduced free lists
– we write the LIST_ENTRY
directly at the start
of each block. We then initialize that entry and add it to
free_lists[0].
Now free_lists[0] points
to one LIST_ENTRY which
is at the beginning of memory_start.
free_lists[0]
is special. It is
level 0 in all of the figures presented so far. All figures at level
0 has one thing in common – the
block size is the entire buffer size.
In the above example, because we allocated a buffer of PAGE_SIZE
bytes, free_lists[0]
points to one free
block that is PAGE_SIZE
bytes. All of the
other free lists are empty.
This means we only have one
free PAGE_SIZE
block and nothing
else. This is correct
– since we
didn't allocate anything yet, we have a PAGE_SIZE
block of free
memory!
Allocation
Alright,
so the idea is that we keep on dividing the blocks by two until we
find the largest allocation size that we need. For example, let's say
that we need to allocate 32 bytes. This is 2^4 so we should take the
32 byte block from level 4-1=3 (recall that we started with level 0.)
So our allocation function needs to take our buffer and divide it by
two until we get a 32 byte block. Let us also assume a 256 byte
memory buffer. Please see the following figure.
In
the above figure, we allocate the first 32 byte block from the
buffer. To do this, we took the 256 byte buffer and divided it into
two 128 byte blocks. We then divide the first 128 byte block into two
smaller 64 byte blocks. We then divide the first 64 byte block into
two 32 byte blocks. Note how blocks 1 and 2 are buddies, blocks 3 and
4 are buddies and blocks 7 and 8 are buddies. We grayed the blocks
that have been divided. Block 7 is then returned from the allocation
function.
But keep in mind that these divisions are happening
on the same 256
byte buffer. The above figure only visualizes how the buffer gets
divided. The actual buffer looks like the following figure.
Compare
these two images to see how the algorithm divides the buffer. Notice
the first 32 byte block (block 7) is now allocated. Because we
divided block 3, we ended up with a left-over 32 byte block (block
8). Because we divided block 1, we ended up with a left over 64 byte
block (block 4). And because we divided block 0, we ended up with a
left over 128 byte block. These left over blocks are free
for use the next time we need to allocate more memory. Our free lists
reflects this below.
Compare
the resulting free list to the preceding two figures. Because we have
no more 256 byte blocks to allocate, the first list is empty.
However, we do have one 64 byte block and one 32 byte block that can
be allocated. If we allocated another 32 byte block later on, we
could just take block 8. If we needed to allocate two 64 byte blocks,
we can allocate block 4 and then divide block 2 into blocks 5 and 6
and then return block 5 (these blocks are shown in the figure at the
beginning of the section.)
The dividing operation is recursive
and so we can write the algorithm in this way. So, let's start with
the following.
void* alloc(size_t size) { index_t level = get_level(size); return _alloc(level); }
Here, get_level just returns the zero based level number
given a size. That is, if the size was 32 bytes, the level is 3
because 256/2/2 = 32. We then call _alloc which will be our
recursive function.
void* _alloc(index_t level) { void* memory; LIST_ENTRY* left; LIST_ENTRY* right; LIST_ENTRY* link; if (IsListEmpty(&free_lists[level]) { } link = RemoveHeadList(&free_lists[level]); memory = (void*) link; return memory; }
So, we make sure the free list
at the level has something inside of it. If it does at this level,
then we can simply allocate from that free list. If it is empty, then
we have a bit more work to do. We need to add a new block to the free
list at this level so that it does have something in it. But where
can this block come from? Look back at the free list figure above.
Let's say we need a “32 byte block” but the “32
byte block free list” is empty. We created these 32 byte blocks
by dividing larger blocks. If we can grab a bigger
block from a lower level (let's say, the “64 byte block free
list”) then we can split that block into two new blocks on the
“32 byte block free list.” And if the “64 byte
block free list” was also empty when we tried to do it, then we
can just try the “128 byte block” and divide that. We can
keep going to higher and higher levels as needed until we can get a
block that can be split in two. This is the recursive
step.
We can allocate a bigger block from a lower level free
list by calling _alloc again. If it is successful, we
can split it into two blocks by adding two and insert them both back
into the list. The final code is below.
void* _alloc(index_t level) { uint8_t* memory; LIST_ENTRY* left; LIST_ENTRY* right; LIST_ENTRY* link; if (IsListEmpty(&free_lists[level]) { size_t size; memory = _alloc(level-1); if (!memory) return NULL; /* out of memory. */ /* recall that entire memory size was PAGE_SIZE. In the figures, we used a memory buffer of 512 bytes to keep them small. */ size = SIZE_OF_BLOCKS_AT_LEVEL(level,PAGE_SIZE); /* now we split this block into two. */ left = (LIST_ENTRY*) memory; right = (LIST_ENTRY*) (memory+size); /* initialize them. */ InitializeListHead(left); InitializeListHead(right); /* insert the two new blocks. */ InsertTailList(&free_lists[level], left); InsertTailList(&free_lists[level], right); } link = RemoveHeadList(&free_lists[level]); memory = (uint8_t*) link; return (void*) memory; }
Freeing
Given a
pointer and a size, we can start like we did above.
void free(void* memory, size_t size) { index_t level = get_level(size); return _free(memory, level); }
We did this because this
operation is also recursive.
Think back to the way we are dividing blocks in two. We can combine
those same two blocks to a larger
block when both of them are free. We can then combine those larger
blocks into an even larger block, and so on. This is the recursive
step. For example, we can combine block 7 and 8 back into block 3. We
can also combine block 3 and block 4 back into block 1.
Freeing
is slightly more complicated then allocating so let's start like
this.
void _free(void* memory, index_t level) { LIST_ENTRY* link; size_t size = size_from_level(level); link = (LIST_ENTRY*) memory; InitializeListHead(link); InsertTailList(&free_lists[level], link); }
This is a good start. We add a
LIST_ENTRY to the memory so that we can link it back into the free
list. But how do we combine blocks back together? We need to find
this blocks buddy.
Recall that the way we do this is to look at the block index number.
Our code now looks like the following.
void _free(void* memory, index_t level) { LIST_ENTRY* link; index_t index; addr_t buddy; size_t size = size_from_level(level); index = INDEX_OF_POINTER_IN_LEVEL(memory,memory_start,PAGE_SIZE); if (index & 1) == 0) buddy = (addr_t) memory + size; else buddy = (addr_t) memory – size; link = (LIST_ENTRY*) memory; InitializeListHead(link); InsertTailList(&free_lists[level], link); }
Now that we have found the buddy, we need to check if it is also free. If it is still allocated, we can't just combine them as one big free block. Since these blocks are buddies, they have the same size and the same free list level. So we can quickly check it by scanning the free list at free_list [level] to find buddy. Our code now becomes the following.
void _free(void* memory, index_t level) { LIST_ENTRY* link; LIST_ENTRY* buddy_link; index_t index; addr_t buddy; size_t size = size_from_level(level); index = INDEX_OF_POINTER_IN_LEVEL(memory,memory_start,PAGE_SIZE); if (index & 1) == 0) buddy = (addr_t) memory + size; else buddy = (addr_t) memory – size; buddy_link = NULL; if (! ListEmpty(&free_lists[level])) buddy_link = free_list_find(buddy); link = (LIST_ENTRY*) memory; InitializeListHead(link); InsertTailList(&free_lists[level], link); if ( buddy_link == buddy) { /* both blocks are on free list. */ } }
We can now go ahead and actually merge the blocks. This is done by removing the current block as well as the buddy from the current list at free_list [leve]. We then insert it into free_list [level-1] as one free block. However, we might also be able to free this larger block as well, we don't yet know. So instead of just inserting the block, we will recursively try to free the larger block until we cannot free anymore. The following is the final code.
void _free(void* memory, index_t level) { LIST_ENTRY* link; LIST_ENTRY* buddy_link; index_t index; addr_t buddy; size_t size = size_from_level(level); index = INDEX_OF_POINTER_IN_LEVEL(memory,memory_start,PAGE_SIZE); if (index & 1) == 0) buddy = (addr_t) memory + size; else buddy = (addr_t) memory – size; buddy_link = NULL; if (! ListEmpty(&free_lists[level])) buddy_link = free_list_find(buddy); link = (LIST_ENTRY*) memory; InitializeListHead(link); InsertTailList(&free_lists[level], link); if ( buddy_link == buddy) { RemoveListEntry(link); RemoveListEntry(buddy_link); if (index & 1) == 0) _free(link, level - 1); else _free(buddy_link, level - 1); } }
Using Bitmaps instead of Free lists
You can certainly use bitmaps instead of free lists for the implementation of the buddy allocator. Since the free lists only provide us the “free” blocks, we can mark them as “0” and all other blocks in a given level as “in use” and give them a “1”. Each level would have its own bitmap that gives us the state of every block at that level. The code will simplify slightly however we would also introduce a small performance penalty during allocations and frees. Try to implement the above using a bitmap for an exercise if it helps.
This last allocator
is the most complex allocator that we will introduce here. It makes
heavy use of free lists, even more-so then the buddy allocator. It is
a kernel object caching allocator and the general allocator used in
Mach and Linux like operating systems. The original design supported
kernel object caching and construction. It was later revised and
updated to support per-CPU caches to improve performance on
multiprocessor systems, and cache coloring to improve cache usage. We
will cover the original design here. The revised design
adds another whole layer on top the original design and will just
make things more complex. The slab allocator tends to be about 1,000
lines or more depending on the architecture of your design. Because
of this, we will simplify the code for presentation purposes in the
text.
We will first start by introducing the data structures
that we will need. We will discuss them farther next.
typedef struct _SLAB { CACHE* cache; LIST_ENTRY listEntry; size_t objectCount; size_t usedObjects; size_t bufferSize; union { void* freeList; LIST_ENTRY bufferControlFreeListHead; }u; }SLAB; typedef struct _BUFCTRL { void* buffer; SLAB* parent; LIST_ENTRY entry; }BUFCTRL; typedef struct _CACHE { size_t size; int align; int flags; SPIN_LOCK lock; LIST_ENTRY fullSlabListHead; LIST_ENTRY partialSlabListHead; LIST_ENTRY emptySlabListHead; LIST_ENTRY listEntry; }CACHE;
If we were also going to implement the revised and updated version,
we would also have introduced a CPUCACHE structure. These
structures might look daunting at first (caches, lists within lists,
lists of free lists, BUFCTRL, and others). Let's look at the design
structure and hopefully we'll get through these structures and what
they are for.
Design
Oh
boy...
You
might have realized that we have not once mentioned the standard
malloc and free. Everything we covered so
far can be used as a basis to implement heaps in kernel and user
space. Topics such as free lists, bitmaps, combing blocks, caching,
are fundamental in general allocators that need to handle memory of
arbitrary sizes. The slab allocator is an example of a real
kernel heap allocator and is used by several operating systems. This
is why we decided to cover the slab allocator. However, user space
allocators must be able to handle arbitrary allocation sizes. General
allocators have their own set of problems and design issues that must
be considered when implementing.
Of course, these general
purpose algorithms can also be applied in kernel mode to implement a
simple kernel heap. If you need a simple allocator and do not have
the need for anything as complex as the slab allocator, going with a
best fit or first fit might be a better option.
Until next time,
~Mike ();
OS Development Series Editor