Resources

Templates across API boundaries: Implementing template generators

C++ is controversial. For every discussion about it, a certain percentage of people will love it and another substantial group will hate it.

Above all, one feature highlights this divide: templates. Templates have many problems: they slow compile times, force implementations to be specified in headers and can bloat generated code. However they’re also a powerful mechanism for programming against generic types.

In the first article in this series, Undo software engineer, Stephen Cross, explored various methods of using templates effectively in this article for Embedded.com. See the original article.

In it, we saw how C++ templates have significant limitations, most notably that they can’t be used across API boundaries. We also investigated a few implementations for run-time templates, but found they either didn’t work or had unacceptable costs. In this part I’ll explain how run-time templates can be implemented in an efficient way using a new technique called ‘Template Generators’.


Template Generators

It turns out the stack isn’t workable, because the stack frames can disappear before we’ve actually used the interface reference to call a method. We can’t use the heap, since it requires management overhead and is largely opaque to optimizations.

Fortunately we can use generated code to solve this problem. Template generators are functions emitted by the compiler which can be called and will return template arguments. Sounds simple, right?

OK, let’s look at this example again:

void f() {
   g<int>();
}
 
template <typename T>
void g() {
   h<T, float>();
}
 
template <typename A, typename B>
void h() { }

 

Clearly, no one function knows all of the arguments to h(). g() knows the second argument will be, but has no idea about the first argument. f() knows what it’s giving to g(), but doesn’t know anything about h().

However we can see there’s a clear stack pattern here. The top of the stack is f(), which passes its template arguments down into g(), etc. As we saw previously though, this ‘template’ stack outlives the run-time stack. Conveniently, however, this stack is unchanging.

So let’s write functions to represent this stack:

void root_template_generator(type* args) {
   args[0] = { int_vtable };
   g_template_generator(args);
}
 
void g_template_generator(type* args) {
   args[1] = { float_vtable };
   h_template_generator(args);
}
 
void h_template_generator(type* args) {
   // Nothing to do.
}

 

If you call root_template_generator() with a large enough array it’ll fill the array with the template arguments of h(). This is a start, but we need a way to get the template arguments of g() as well.


Paths
The issue we’ve hit here is that we have a tree with a root template generator, an intermediate template generator and a leaf template generator and we want to be able to specify a path through that tree that supports early exit. Fortunately we can represent the path extremely compactly via a binary string.

For example:

  • To get arguments of g(): ‘1’
  • To get arguments of h(): ‘10’

Firstly, we start every path with a ‘1’ so that we can compute the beginning of the path efficiently when it’s represented as a integer. Then at each node in the template generator tree there are a number of possible exit transitions. One of these transitions is to return immediately with the current template arguments. Other transitions would be to continue down the tree.

In this case, we have the following transitions:

  • root_template_generator() sets up arguments for the first templated function and then always calls the next intermediate generator, so no bits required for path. It does however need the initial ‘1’ for determining where the path begins.
  • g_template_generator() can either return template arguments (indicated by the path ending) or continue to h_template_generator() (represented with ‘0’).
  • h_template_generator() can only return template arguments; an unambiguous choice. Hence no bits required in the path.

So we’ll have code like the following:

void root_template_generator(type* args, path_t path) {
   args[0] = { int_vtable };
   g_template_generator(args, path, get_path_start(path));
}
 
void g_template_generator(type* args, path_t path, size_t position) {
   if (position == 0) {
       // Path end.
       return;
   }
 
   // Expecting to see a ‘0’.
   assert(((path >> (position - 1)) & 0x1) == 0x0);
 
   args[1] = { float_vtable };
   h_template_generator(args, path, position - 1);
}
 
void h_template_generator(type* args, path_t path, size_t position) {
   assert(position == 0);
   return;
}

 

If we call root_template_generator with path = ‘1’, then we’ll get the template arguments for g(). If we call it with path = ‘10’, we’ll get the arguments for h(). Excellent.


Template Generators in References
Remember the problem from before? We had the following code:

template <typename T>
class Example {
     void do_thing();
}
 
interface DoThingable {
   void do_thing();
}
 
void f(Example<int>& object) {
   g(object);
}
 
void g(DoThingable& object) {
   object.do_thing();
}

 

Unlike the other solutions, template generators work nicely in this case: we store the template generator in the reference type. Hence the reference type is implemented as a struct:

struct {
   void* object;
   struct {
       __vtable_type* vtable;
       struct {
           void* root_template_generator;
           path_t path;
       } template_generator;
   } type_info;
};

 

We’ve got the object pointer, as you’d expect. We have a vtable, as you’d also expect; in Loci vtable pointers are not stored in the objects.

The new fields are the root template generator function pointer and the path value. When we call a method on the interface type the generated code will just pass in these two fields to the method. It will then call the root template generator with the path, and it will get its template arguments. It can also call other functions and give them its template arguments (or types derived from its template arguments).

At this point, it’s easy to get confused; while we don’t know what method we’re calling at run-time, the template generator chain is derived from information known at compile-time. In the example above the template generator chain is constructed the moment we specify ‘Example<int>’. Then when we cast ‘Example<int>&’ -> ‘DoThingable&’ we put the template generator struct into the reference.

This is a significant overhead: our interface reference is around four times the size of a pointer. However this overhead only applies for references to interfaces; if you have a reference to a class or struct that will be the same size as a pointer. (Future work is likely to involve adding a way to specify that an interface reference cannot refer to a templated type, which will halve the size of that reference.)


Using template generators
While we know how template generators are created, we still don’t how they’re used. Let’s go back to our basic example: 

void f() {
   g<int>();
}
 
template <typename T>
void g() {
   h<T, float>();
}
 
template <typename A, typename B>
void h() { }

 

The generated signature for f() is obvious, since it isn’t templated:

void f_impl();

For g() and h() we need to pass an extra argument for the template generator. This is the following struct: 

struct template_generator_t {
   void* root_fn;
   path_t path;
};

 

Hence:

void g_impl(template_generator_t template_generator);
void h_impl(template_generator_t template_generator);
 

As described previously, any function can get its template arguments by calling root_fn() and passing in the path.

The interesting part is how to create this structure in the first place. Here are our paths:

  • To get arguments of g(): ‘1’
  • To get arguments of h(): ‘10’

With this in mind, here’s our generated code for f(), g() and h(): 

void f_impl() {
   template_generator_t template_generator;
   template_generator.root_fn = root_template_generator;
   template_generator.path = 1;
   g_impl(template_generator);
}
 
void g_impl(template_generator_t template_generator) {
   // This is how we get our arguments.
   type_t template_args[MAX_TEMPLATE_ARGS];
   template_generator.root_fn(&template_args[0], template_generator.path);
 
   template_generator.path <<= 1;
   template_generator.path |= 0; // (for clarity; obviously this is a no-op)
   h_impl(template_generator);
}
 
void h_impl(template_generator_t template_generator) {
   // This is how we get our arguments.
   type_t template_args[MAX_TEMPLATE_ARGS];
   template_generator.root_fn(&template_args[0], template_generator.path);
}

 

Firstly, f() knows how the chain begins. So it specifies the root template generator function and an empty path. This will be enough to get g()’s arguments.

Inside g(), we have no idea who called us (in the hard case), but we do know that in the template generator chain ‘0’ needs to be added to the path to get us to h_template_generator(). Hence we left shift the path and use OR to set the low bit.

Finally, h() doesn’t pass its template arguments anywhere (it’s a leaf node), so no further path additions are needed.


Returning/passing a runtime-computed type
Runtime templates can be a confusing idea because templated types are part of a function signature. For example:

template <typename T>
T function() { … }

For this case we’ll generate the following:

void function(void* return_ptr, template_generator_t template_generator);
 

So we return the template type by passing in a pointer, which is where we store the result. By using indirection we ensure that function() can still support any type. Similarly, arguments of template type are passed by pointer. This strategy is very similar to how C++ non-POD types are passed.

From a performance point of view, indirection is bad. Fortunately we can tag this pointer with all sorts of nice properties like ‘noalias’, ‘nocapture’ etc., and then optimization passes will understand what’s going on.


Next time
So far we’ve seen how templates can be implemented via template generators. In the final article in this series, I explain how template generators can be optimized to minimize, or in many cases eliminate, their overhead.