Real World Comparison, GC vs. Manual Memory Management

During the 4th Semester of my studies I wrote a small 3d spaceship deathmatch shooter with the D-Programming language. It was created within 3 Months time and allows multiple players to play deathmatch over local area network. All of the code was written with a garbage collector in mind and made wide usage of the D standard library phobos. After the project was finished I noticed how much time is spend every frame for garbage collection, so I decided to create a version of the game which does not use a GC, to improve performance.

In a pc game you usually want to achive 60 FPS (frames per second). That means you
have 16.6 ms time to simulate and render a single frame. Also you have to prevent big variations in frame time, as this will lead to visual stuttering or other issues.

I created three version of the game for comparsion:

  • GC Version compiled with DMD: The original version of the game. Compiled with dmd 2.058 (-O -release -noboundscheck -inline). Not all memory is managed by the garbage collector. Large blocks of memory that only contain data and no pointers are allocated manually to improve garbage collector performance. In this version the GC is run manually once every frame, otherwise the frame times would vary to much, as you would get collection times of multiple seconds, which is not acceptable for games.
  • GC Version compiled with GDC: Same as above just compiled with 2.058 GDC eqivalent. (-fno-bounds-check -frelease -O -finline-small-functions -findirect-inlining -fpartial-inlining -fpeephole2 -fregmove)
  • Manually Memory Managed Version compiled with DMD: I throw away most of phobos and wrote my own replacements for it with different interfaces, as the phobos interfaces are usually not suitable for manual memory management. Small parts of phobos could be reused, for example std.traits. Also I made quite some changes to druntime. I added a reference counting mechanism to druntime to make threads reference counted. Also I added a memory tracker which would track and report memory leaks on program ending. Also all parts of druntime that did leak memory during developement have been fixed. For example I implemented a non leaking cache friendly hashmap. This version of the game uses manual memory management most of the time. If manual management is not feasable reference counting is used instead.

Other then the memory management code, the code of the GC version and the manual memory management version are exactly the same. This is the ideal real world comparison for GC vs. manual memory management.

Results

  • DMD GC Version: 71 FPS, 14.0 ms frametime
  • GDC GC Version: 128.6 FPS, 7.72 ms frametime
  • DMD MMM Version: 142.8 FPS, 7.02 ms frametime

GC collection times:

  • DMD GC Version: 8.9 ms
  • GDC GC Version: 4.1 ms

The astonishing thing here is, that the manual managed version compiled with dmd is still faster then the highly optimized GC version GDC generates. I could not compile my manually managed version with GDC yet as druntime for GDC has a completely different folder structure and quite some modifications.

Most of this performance improvements come from the fact that the garbage collector does not eat any time. But also if you write code with a GC in mind you often do not think about the consequences when allocating memory, because the GC will deal with it. This often leads to highly imperformant code, the best example is the comparison of TypeInfo objects in druntime, which is described below.

Biggest Performance and Memory Leaking Issues in D 2.0:

Here is a list of the most severe problems I found during developement:

  • Comparision of TypeInfo objects in druntime is done by building two strings and then comparing those two strings. This will always leak memory and do a lot of unneccesary allocations which are a performance bottleneck. I reworte the comparison so it does not allocate a single byte.
  • Calls to the druntime invariant handler are emitted in release build also and there is no way to turn them off. Even if the class does not have any invariants the invariant handler will always be called, walk the class hirarchy and generate multiple cache misses without actually doing anything.
  • The new statement will not free any memory if the constructor throws a exception. So you are forced to replace both the new and the delete statement. But you can not replace the new statement with similar nice syntax especially for inner classes and arrays.
  • Inlining of DMD. Inlining of DMD seems to be very minimal. Most of my overloaded operators are not inlined by dmd at all.
  • Array literals. They always allocate, even if they don’t have to. For example when asiging to a fixed size array, or when passing to a function with a scope parameter.
  • D-Style variadic functions. Same as array literals, because internaly they are rewrote to one. Especially for these kind of functions I don’t see any reason why they should allocate on the heap.

Currently only my modifications to druntime and phobos are aviable on github. But the full sourcecode for my small standard library and the game wil follow soon. All this has been done with dmd 2.058 and the druntime and phobos state of 2.058. But now that I’m done I will update to 2.060 soon.

druntime (build with make -f win32nogc.mak, currently only works on windows)
phobos (build with make -f win32nogc.mak, currently only works on windows)
thBase (my standard library)

Update:
I found a piece of code that did manually slow down the simulation in case it got to fast. This code never kicked in with the GC version, because it never reached the margin. The manual memory managed version however did reach the margin and was slowed down. With this piece of code removed the manual memory managed version runs at 5 ms which is 200 FPS and thus nearly 3 times as fast as the GC collected version.