The Total War codebase has been maturing for a little over twenty years now, and is starting to resemble a very fine, rich port. Shogun: Total War and Medieval: Total War were released on version 1, Rome: Total War and Medieval 2: Total War were released on version 2. Having got the hang of things by then, we moved to version 3. The new architectural approach of TW3 has kept us going since Empire: Total War, released in 2009.
By the time we reached this title we started to run into a problem. A strange error appeared, 0xC0000017 STATUS_NO_MEMORY (Not Enough Quota). It happened on all machines, regardless of the amount of RAM installed on the motherboard. I researched, and discovered that we had a finite amount of address space available to us. A 32-bit Windows process has 4GB of addressable storage, but half of that is reserved to the OS. We only had 2GB available to us. A little more digging around revealed to us that we could get another 1GB if we tweaked the OS environment and selected /LARGEADDRESSAWARE in the linker options, but it was not appropriate to ask users to make that kind of adjustment. Our players are a bright bunch, but we decided this was too much. We had to constrain and control our address space usage. Suddenly, we were overloading operator new.
Object storage duration
Object storage duration and linkage is a significant matter in C++. I usually include a question about it in interviews. You can read about it at cppreference.com
. Running out of storage means that you need to store fewer things, make the things you are storing smaller, or both. I know this sounds obvious, but it bears stating unambiguously. The total amount of data in a Total War game far exceeds 2GB. Nearly all the data is stored offline on a file system, added into the process when needed, and removed from the process when no longer required. However, “store fewer things” is easy to say and hard to do.
Consider automatic storage duration objects. These are kept on the stack. Making significant changes would require introducing tracking to every scope, which although feasible requires additional per-function code, much like exception handling. Anyway, the stack is already accounted for by the linker. It is allocated as part of the thread and although you can change the size of the stack in your linker options, it isn’t advisable. Filling and emptying it won’t change the amount of storage available to you, so it’s not worth pursuing savings in this storage class.
static and thread storage duration objects are a different matter. We can identify those by searching for the keywords static and thread_local. Unfortunately, this won’t catch ALL the static storage duration objects: anything defined at file scope has static storage duration. In addition to polluting the global namespace, this is another reason not to define objects at file scope but at anonymous namespace scope instead. They will still have static storage duration, but they will be easier to find.
Empire: Total War was made before C++11 was published, so we were making no use of thread storage duration objects. We had only a small number of static storage duration objects. This left us with reducing dynamic storage duration objects.
Recall how objects are managed. Some address space is found, an object is constructed into that space, the object does its object thing, the object is destroyed, and the address space is returned.
For automatic storage duration objects finding and returning memory is simply a matter of increasing and decreasing the stack pointer. Once the stack pointer gets too large, for example during recursion that goes too deep, no more storage of that class is available.
For static storage duration objects, the storage is found at the start of execution and returned at the end of execution. Static objects are created at startup, more precisely at static-init (except for static objects with function scope: they are created as their owning functions invoked) and destroyed at exit during static-deinit. For thread-local storage duration objects the story is similar, except that the storage is found as part of the creation and destruction of the thread. You could consider static-init and static-deinit as being the creation and destruction of the main thread.
For dynamic storage duration objects, the storage is found by calling operator new and returned by calling operator delete. This happens during the invocation of the new operator and the delete operator. I’ll rephrase that: the new operator calls operator new and then the constructor of the object being created, while the delete operator calls the destructor of the object being destroyed and then operator delete. Differentiate between the new operator and operator new, and the delete operator and operator delete. Indeed, if you look at std::allocator_traits
, you will see four important member functions, allocate, construct, destroy and deallocate.
Overloading operator new
There are rather a lot of operator new functions; the standard defines 22, 14 of which are non-member functions. How does the new operator know which operator new to call? It does so by considering the parameters passed to it. Ordinarily, there will be no parameters. You would simply write
auto t = new T(args);
The first parameter is implicit: the amount of storage to allocate. This would call operator new(std::size_t). If you already have an address p to put the object, you can write
auto t = new(p) T(args);
This would call operator new(std::size_t, void*).
As I said earlier: “Suddenly, we were overloading operator new”. We did this by declaring an overload of operator new and defining it inline to call our own function, ca_malloc, which was declared in our own namespace, which, as you may be able to guess, is CA for Creative Assembly. Our overload contained the filename and line number, so the signature looked like
void* operator new(std::size_t, const char*, int);
To invoke this, we had to replace every call to new with new(__FILE__, __LINE__). Then we realised that the release version of the game should call regular operator new, so we needed to replace our new invocations with a preprocessor macro. I first saw this trick in MFC in the early 90s.
After overloading operator delete to call ca_free, we could take a look at our allocation patterns. Not only that, we could see which objects were being released long after they were needed, or never released at all. This last category is a memory leak. Memory leaks are A Bad Thing as they consume resources for the remainder of the process. We got rid of our memory leaks, tracked them ferociously, named and shamed programmers who leaked resources, and kept track of everything. According to our accounting, we were well inside the 2GB limit. However, we were STILL running out of storage.
Accounting for overhead
Unfortunately, allocating 24 bytes doesn’t reduce your available storage by a mere 24 bytes: there is book-keeping to consider as well. The memory manager has to keep track of what is available and what is in use. This all takes additional storage, and the more allocations you have, the more storage you require to keep track of it all. We were unable to reduce the amount of objects we had and the storage they might require, which meant that we had to reduce the overhead by writing a better memory manager and invoking that instead.
This is not a trivial undertaking. However, operator new is a worst case scenario memory manager: it is a single function that knows nothing about how long something is needed for, nor how big anything is going to be. We knew a little more about our objects than the memory manager, and I wrote a series of memory managers to take advantage of this.
The first was the fixed-size memory manager. If you know that you’re going to be allocating lots of objects that are 24 bytes big, you can keep a big slab of memory to accommodate them all in consecutive slots. You can use a large bitfield to track which slots are occupied and which aren’t, and use the address of the allocation to identify which slot it came from.
Then there was the LIFO memory manager. If you know that you’re going to be creating things and destroying them in reverse order, then you don’t need to track anything at all. You can simply use some storage to accommodate them and create your own pointer to the next free space. This will point back and forth through the storage as items are created and destroyed. This is useful for traversing UI, which tends to embody a tree structure going forward and backwards along branches.
Next was the frame memory manager. If you know the upper limit of the capacity of a container, and you know it’s duration is within a particular stack frame, you can create an allocator on the stack before the container, and then pass that allocator to the container. This allows you to use automatic storage instead of dynamic storage, providing an instant win.
A variation of this was the duration memory manager. If you have a family of objects that you know are going to exist only within a particular frame, you can create a general purpose memory manager in automatic storage and ensure that all allocations made by this family of objects go through that memory manager. You can also check that there are no outstanding allocations when it is destroyed and eliminate memory leaks.
Replacing operator new
So far I have spoken about overloading operator new. By typing new(some parameters) you can customise your allocation. But what about containers? How do you overload their memory managers?
Recall that the standard containers all take an allocator parameter, which dictates how memory will be managed. You can simply replace the default parameter, std::allocator, with your own allocator, which defines the functions of std::allocator_traits as described earlier.
How about third-party code? What if you are using a third-party header only library that uses standard containers with default allocators? What if it makes direct calls to new? Then you have to REPLACE operator new.
This was easy enough in the Total War codebase. By defining void* operator new(size_t) as an inline function the compiler was able to resolve the call immediately, so no linkage was required. This replaced the implementation of operator new located in the C++ runtime library by not requiring the linker to search for it.
Unfortunately, the story doesn’t end there. If you grab a copy of the standard at wg21.link
and look for [replacement.functions] (it’s a big PDF, have patience) you will see that it’s not so easy. Particularly, regarding replacement functions,
The program’s declarations shall not be specified as inline. No diagnostic is required.
This was news to me when I upgraded the Total War codebase to support the v142 MSVC toolset. Suddenly, I got a new warning telling me that I couldn’t declare operator new inline. This was rather a change from no diagnostic required. We have a zero warnings policy (/W4 /WX) so I was required to address it.
I’ve been using the Microsoft C and C++ compiler since 1989 (I think). It has been quite permissive. For example, if a function takes a reference to a non-const object, for a long time I could simply create a temporary at the call site. Strictly speaking, this is only permissible for a function that takes a reference to a const object. There are several other examples, such as two phase lookup, which have been tightened up over recent years. Now it was the turn of inline replacement of operator new.
I didn’t worry: the Total War codebase has a foundation library that everything links to. I defined the functions there and hit build, sauntering over to the kitchen to make a pot of Earl Grey tea. On my return I was greeted with a raft of linker errors, telling me that operator new(size_t) was multiply defined in my foundation library and, of course, in the C++ runtime library.
The standard has very, very little to say about linkers. I was now in implementation-defined territory, relying on what I knew about the toolset and its history. The first thing I did was look at the command line, where I saw all the library inputs, some object files, and a batch of innocuous looking command line options, which I carefully examined nonetheless.
There didn’t appear to be anything in any documentation outlining how to replace operator new. There was no linker magic referred to. I switched the /VERBOSE flag on and examined the tens of thousands of lines of output. Clear as day, there was operator new, imported from the C++ runtime library.
I decided some experimentation was in order. I wanted to be absolutely sure that the command line was the only place that any library dependencies were being declared so that I could see if the library declaration order was significant. This was not trivial. There are three ways to tell the linker what libraries to search.
- The command line – the most obvious way
- Importing dependents – declaring in the project properties that a project is dependent on another
- #pragma comment(lib, <library name>) – the most insidious of all
Methods one and two are quite benign. You can explicitly say “look in hat.lib” by simply appending hat.lib to the command line, just as you do with object files. Importing project dependencies is reflected immediately in the command line as well. Method three is completely invisible. It plants a comment in the object file which signals the linker to find and import a library.
This has its advantages. By including a header for a third party library, you can automatically add a library to your imports and not have to worry about adding it to your command line. This is especially important if there are several versions (NDEBUG, _ITERATOR_DEBUG_LEVEL=2 and so on) and you don’t want to have to remember all the incantations. For example, when you add /MT to your compiler command line the correct comment is added depending on whether DEBUG is defined (static library, release version).
Unfortunately it obscures the logic by which it was imported, requiring you to spelunk your way through a pile of #include files. Fortunately there is a linker command to ignore these directives, /NODEFAULTLIB. Ordinarily you would use it to turn off explicit libraries, for example /NODEFAULTLIB:libcmtd, but if you simply pass it unqualified to the command line, all #pragma comment(lib) directives will be ignored.
I was now able to explicitly craft my command line, but it didn’t help. I needed to include the C++ runtime, and I needed to include my replacement operator new. Worse, sometimes I could get projects to link, while other situations would report failure, without apparent rhyme or reason. It seemed to be that if I put the C++ runtime as the last argument on the list, projects would generally link successfully, but it was not a guarantee.
Ask the implementers
I admitted defeat and asked the folk who work on the toolset for help. One advantage of attending committee meetings is that I get to meet these people, so I knew how to direct my request. I got a very enlightening response.
The new behaviour of warning about inline specification of replacement operator new was requested by the MSVC standard library developers, as such definitions can sometimes cause surprising behaviour. For example, imagine you accidentally define two replacements in the same project, perhaps one that does logging and one that does fixed-size allocation. When they are not specified as inline there is a redefinition error when you try to link. Conversely, when they are specified as inline then that is a violation of the One Definition Rule and the linker will pick one and discard the other. You might be expecting some allocations to be frame-allocated, but instead they would all be logged. Good luck debugging that.
The apparently non-deterministic linker behaviour for accepting replacement new and delete is a little more convoluted and deeply linker specific. This explanation is unique to the MSVC v142 toolset, so assume nothing about other implementations.
Firstly, of course it isn’t non-deterministic. Here are two considerations:
- Everything provided to the linker command line as an object file is opened, those symbols which are defined and referenced are retrieved, and then the libraries provided on the command line will be searched to resolve any remaining symbols.
- All libraries passed to the linker are equal. There is no preference given to user-provided libraries over implementation provided libraries. This includes the OS as well as the standard runtime. They are all equal.
Once all the explicit object files have been considered, the object files within the libraries on the command line are searched for remaining unresolved symbols. If such a symbol can be satisfied by an object file in a library, the entire object file is imported into the image. The linker will use the first such definition it finds, and it will not complain if more than one library could satisfy that definition: any others are simply ignored. However, once the linker has decided on an object file to include, it will ensure that no duplicates will be introduced, since the object file could also include definitions of other symbols that already exist, in addition to the symbol it was searching for.
This library resolution stage is recursive, since including an object file from a library can also bring in new undefined symbols, but this doesn’t mean that the search is restarted from the beginning of the command line. In fact, this algorithm is very complex, the result of many years of refinement, in order to reduce average link times.
As a result, including a replacement operator new definition in a library can be problematic, and the appropriate solution is to include your definition as an object file on your linker command line. Also, if you’re going to replace one operator new/delete pair, replace them all: you don’t want to find you’re suddenly changing to an array-new and returning to the standard library allocator without realising it. You’ll find them in the [replacement.functions] section mentioned earlier.
Although Total War has moved to 64-bit, we still need to track storage. It’s unlikely we will run out of address space, but our users are likely to run out of RAM. This is actually a harder problem, since different users have different RAM on their systems. Correctly overloading and replacing operators new and delete is a simple task but requires a steady hand at link time. I hope you have a clearer idea about how (and why) to do this: please ask questions in the comments.
I would like to thank Jonathan Emmett and YongKang Zhu from Microsoft for help and review in the preparation of this blog post.