Garbage Collection in Minim
Currently, one of the key issues in Minim is the lack of garbage collection.
To handle precise memory management, I implemented a weird owner-and-reference
system that has been quite tiresome to keep track of.
To solve this, I am developing a garbage collector for the Minim, and I recently
merged
a working version into the project for testing.
The garbage collector greatly simplifies the code base by getting rid
of calls to free()
and removes most instances of copying objects.
In terms of performance, Minim is now much slower, anywhere between 2x and 4x.
Much of this slow down is beacuse allocated objects are no
longer freed precisely, and there is additional overhead
from tracking allocations.
The Minim GC is a generational, conservative, mark-and-sweep garbage collector implemented in C based on the Tiny Garbage Collector, a minimal garbage collector. It expands on the TGC by separating allocations into two generations, young and old. The younger generation is marked and swept every cycle, by default every time new allocations total 8 MB in size. Any surviving allocations are moved to the older generation. The older generation is marked and swept every 15 cycles.
The stack is swept from the “bottom” of the stack, provided during
initialization, to a local address within the sweeping function.
A neat trick to always forcing pointer addresses on the stack no matter
the optimization level is to use the setjmp(jmp_buf env)
function
from setjmp.h
which saves the values of registers to set a jump point.
It makes sense that this function exists, but I honestly didn’t realize
it could be used for this mechanism.
Like the TGC, the Minim GC provides a way to associate a destructor with an allocated block in case memory allocated outside of the garbage collector needs to be freed upon sweeping. However, it also provides a way to associate a “marking” function with each memory block. These functions precisely mark internal pointers so that the garbage collector doesn’t naively mark every possible pointer within the memory block. These functions can cause issues if a developer changes a struct but not its respective marking function; I ran into this issue on a few occasions when switching Minim to use the garbage collector. In addition, atomic allocation macros are provided to allocate data not containing any pointers.
Although naive, the Minim GC has made working with the Minim code base much easier since I don’t have to spend hours dealing with segmentation faults. If I had to make Minim all over again, I definitely would have stuck with using a garbage collector from the beginning. It eliminates my owner-and-reference system that was causing quite a headache when dealing with copies of objects. Now the same object can be in a list, hash table, and vector, all at the same time, since immutability is respected.
As for perfomance issues, I managed to claw back almost all of the slow down by adding tail call optimization which ensures the amount of allocated memory and the size of the stack remains quite small. This garbage collector along with tail call optimization, quasiquoting, and syntax macros will be in the next release, and is currently in the main branch.