Introduction

So a few days ago I put together a short essay on Go, Google's new programming language. It seemed to strike a nerve and quite a lot of people read it.

This made me a little uncomfortable (as well as causing me to scan the skies searching for the End Times) given that I gave the language a pretty hard time. There's actually quite a lot to like about Go; its interface feature is genuinely quite interesting, and I can forgive a language a great deal for having proper closures.

What made me so annoyed is that Go could have been so much more, if it had applied these interesting features more generally and adopted a few more other interesting features.

How much more? Let's see.

The joys of interfacing

Firstly, let's consider maps. Go has a builtin associative array type that stores an unorder group of items indexed by some arbitrary value. It looks like this:

var mymap = make(map[string]int);
mymap["one"] = 1;
mymap["two"] = 2;
print(mymap["one"]);

So let's express our map as an interface.

type MapStringInt interface
{
  Get(key string) int;
  Set(key string, value int) int;
};

var mymap = MakeMapStringInt();
mymap.Set("one", 1);
mymap.Set("two", 2);
print(mymap.Get("one"));

(I'm ignoring ranges for now.)

This has a huge advantage over the builtin type: any type which implements our Get() and Set() methods is automatically considered by the Go runtime and type system to be a MapStringInt. This means that we can have multiple different implementations, and they're all type compatible.

type HashMap ...;
func (self *HashMap) Get(key string) int { ... }
func (self *HashMap) Set(key string, value int) { ... }
func MakeHashMapStringInt() MapStringInt { ... }

type TreeMap ...;
func (self *TreeMap) Get(key string) int { ... }
func (self *TreeMap) Set(key string, value int) { ... }
func MakeTreeMapStringInt() MapStringInt { ... }

It doesn't matter whether I call MakeHashMapStringInt() or MakeTreeMapStringInt() --- the result will be a MapStringInt. Which means I can use it anywhere that I can use a MapStringInt. I've decoupled the implementation from the interface, which is what Go interfaces are all about.

This is particularly powerful given that Go's automatic interface extraction means that any type with Get() and Set() methods will automatically become compatible with MapStringInt.

func AddWidget(map MapStringInt, key string) { ... }

AddWidget(getDNSResolver(), "localhost");
AddWidget(getFileSizes(), "/etc/passwd");
AddWidget(getCustomerWeights(), "Mr Creosote");

In other words, I now have more functionality but have had to think less. That's a good combination.

Generic brands

All is not shiny, however. Our example is contrived and restrictive. We've hardcoded the fact that our map uses string keys and integer values into the type. We can do better than that if we invoke a feature called parametric polymorphism --- commonly called generics.

Let's invent some syntax.

interface Map<K, V>
{
  Get(key K) V;
  Set(key K, value V);
}

var mymap Map<string, int> = MakeMapStringInt();
mymap.Set("one", 1);
mymap.Set("two", 2);
print(mymap.Get("one"));

Our interface here has is describing a generic map from type K to type V. MakeMapStringInt() then returns a specific instance of the map where K is a string and V is an int.

There's nothing particularly exotic about this; Java programmers will find it all boringly familiar. However...

MakeMapStringInt() still needs to contain type-specific magic to actually create the concrete implementation of the map. This is ugly and restrictive; we don't want to have to ship MakeMapStringInt(), MakeMapStringFloat(), MakeMapIntString() etc functions in our runtime. So, we do this.

func MakeMap<K, V>() Map<K, V>
{
  type MapNode struct
  {
    next *MapNode;
    key K;
    value V;
  };

  type MapImplementation struct
  {
    rootnode *MapNode;
  };

  func (self *MapImplementation) Get(key K) V { ... }
  func (self *MapImplementation) Set(key K, value V) { ... }

  return new(MapImplementation);
}

What we've got here is a function that will create a map of any type. The function itself has type variables; it in turn creates private types using those type variables; then it defines methods on one of those types, instantiates it, and returns it as the new map.

It's important to note that MapNode and MapImplementation only exist inside the function, because they're using the type parameters that our function was called with. We can't define them as global types because we don't know what type K and V are. And because these types are private to the function, they're not visible to anything outside the function --- but it's still safe to return the instance of the map, because the compiler will ensure that the instance conforms to Map<K, V> for our specific K and V.

In other words, we have an entirely generic, but nevertheless type-safe, map implementation.

It is, unfortunately, still not quite right. The implementations of the Get() and Set() are going to need to compare map keys; however, since they don't know what type K is, they can't do this. So, we modify the function thuswise:

func MakeMap<K is Comparable<K>, V>() Map<K, V>
{
  ...
}

This syntax extension ensures that K, whatever it is, implements the Comparable interface. Comparable is in turn a global interface, implemented automatically by all our hypothetical Go variant's primitive types:

type Comparable<K> interface
{
  Equals(K this, K that) bool;
}

Objects that implement Comparable can be compared; therefore, our map implementation knows that whatever K is, it's safe to call the Equals() method on it, and can use this to distinguish between items in the list. The compiler won't let us call MakeMap() if K doesn't match Comparable<K>.

We can go further.

type Array<V> interface
{
  Get(index int) V;
  Set(index int, V value);
  Length() int;
};

This interface describes the functionality of a Go built-in array. Arrays have lengths, and integer indices. But... notice that our array is also a map? It conforms to Map<int, V>. This means that we can use our array everywhere that we can use a map.

func MakeArraySlice<V>(array Array<V>, lwb int, upb int)
    Array<V>
{
  type SliceImplementation struct {};

  func (self* SliceImplementation) Get(index int) V
  {
    return array.Get(index + lwb;
  }

  func (self* SliceImplementation) Set(index int, value V)
  {
    array.Set(index + lwb, value);
  }

  func (self* SliceImplementation) Length() int
  {
    return upb - lwb;
  }

  return new(SliceImplementation);
}

And this creates an array slice. It will work with any array, regardless of the implementation; the objects it returns are indistinguishable from normal arrays. (Why is SliceImplementation an empty struct? Because we need one to hang the methods from. All the information we need is accessed via upvalues, so we don't need to store any data in it.)

So, to summarise. Introducing the features described here would allow us to replace the built-in map, array and array slice types with equivalents written in Go. (You'd probably want the compiler to provide a little syntactic sugar to allow traditional syntax.) This would:

  • simplify the language semantics, as we've moved stuff out of the core language and into a library, allowing us to dispense with a whole bunch of weird special-case types;
  • reduce the number of special rules to remember, because all this functionality now behaves like any other interface;
  • reduce the amount of magic runtime support code, as the new library doesn't need to know about Go implementation details, where the current runtime does;
  • allow user types to interoperate with the standard types, allowing such things as a generic Sort() function to sort user data, making code faster and simpler;
  • vastly increased the expressiveness of the type system, allowing us to talk about more complex problems in a much cleaner and more maintainable away.

In other words, more maintainable, more functionality, but with less code and less complexity. Quadruple win!

Typing tutor

We've left Java behind some way back, but C++ is still keeping up with us, more or less. Java's generics system has some major holes in it that mean that you can't instantiate a type referred to by a type parameter. C++'s generics work differently, and are much more powerful despite (in fact, to a large extent because) being much cruder. I can do a version of the MakeArraySlice() function above in C++ and it works fine, although it's rather more code and much less readable.

This is one of the problems with C++; it's really wordy. Here's an example of some of the sort of code that people end up dealing with in C++:

AnObject<String, int, int> object =
  (AnObject<String, int, int>) malloc(
    sizeof(AnObject<String, int, int>));

This example is contrived, but I've actually had to do this sort of thing. I have to repeat the type signature three times; this is bad for readability and maintenance. Can't the compiler figure some of this out for itself?

The answer is yes. Yes, it can. This process is called type inference.

Go actually has limited type inference in it already. We can do:

var a = Fnord();

The compiler's already seen fnord(), so it knows what type it returns, so it knows what type a must be; therefore we can omit the type.

This is useful, but it could go so much further. Consider this function:

func Add(i int, j int) int
{
  return i + j;
}

In Go, there is no implicit coercion between numeric types; therefore we should be able to unambiguously rephrase it as:

func Add(i, j) int
{
  return i + j;
}

If the return type is int, then i and j must also be ints, because that's the only situation where + will return an int.

It goes further. Languages like OCaml allow us to do the equivalent of:

func Add(i, j)
{
  return i + j;
}

What are the types here? The answer is, any type that will fit.

The compiler analyses the code and constructs an implicit interface that matches the operations that it sees us do. In this case, it knows that i and j must be numbers, because it sees me adding them and only numbers support addition. So the function is treated as:

func add<T is Number>(i T, j T) T
{
  return i + j;
}

This then allows us to do:

a = add(1, 2));
b = add(1.1, 2.2);

But it won't allow me to do:

c = add(1, 2.2);

...because floats and ints cannot be added in Go.

The compiler doesn't just analyse single functions; it analyses our whole program. Consider this:

func Method1(o) { o.Foo(); }
func Method2(o) { o.Bar(); }
func Method3(o) { method1(o); method2(o); }

Type inference causes the compiler to treat this as:

type method1_interface interface { Foo(); }
func method1(o method1_interface) { o.Foo(); }

type method2_interface interface { Bar(); }
func method2(o method2_interface) { o.Bar(); }

type method3_interface interface { Foo(); Bar(); }
func method3(o method3_interface) { method1(o); method2(o); }

The beauty of this is that the compiler will detect inconsistencies. If it sees us create an object which has a Foo() method but no Bar() method, and then call method3() on it, it will correctly fail to compile because it's illegal code. We get the the minimalist expressivity of a duck typed language with the robustness of a statically typed language.

The usefulness of type inferencing increases enormously when combined with generics; most of those complicated type parameters just vanish...

func sum(array, lwb, upb)
{
  var total = array[lwb];
  for i := lwb+1; i < upb; i = i + 1
  {
    total = total + array.Get(i);
  }
  return total;
}

...versus:

func sum<T is Number>(array interface { Get(i int) T; },
  lwb int, upb int) T
{
  var total T = array[lwb];
  for i := lwb+1; i < upb; i = i + 1
  {
    total = total + array.Get(i);
  }
  return total;
}

This will add a range from anything with a Get() method that takes an integer key and returns a number; such as an array of ints, or a map with int keys and float values, etc.

But if we try to use it with an array of strings, we'll get an error at compile, because our program is violating type safety.

Here's one of the classic examples of type inference in action.

func ProcessList(list, callback)
{
  var newlist = MakeList();

  for
  {
    var item = list.NextItem();
    if (item == nil)
    {
      break;
    }

    var newitem = callback(item);
    newlist.add(newlist);
  }

  return newlist;
}

This takes a list, and creates a new list by applying the callback to each item in it. (Functional programmers will recognise this as the map operation.)

Despite the absence of overt type information, this function has quite strict requirements. list must support the next() method, returning a type T, and callback must be a callable thing taking a single T parameter, and returning a T'. T itself must be nillable. It's obvious that newlist must be a list because we're calling MakeList() to construct it, but a further constraint is on it that it must be a list of T' (or at least, support add()ing T' to it).

It's interesting to note that item and newitem must be different variables --- we can't just say:

item = callback(item);

...because now we're adding a constraint that callback() must take and return the same type. Type inference doesn't make the static typing go away; it just lets us ignore it a lot of the time!

Type inference isn't perfect, of course. This kind of code analysis is a theoretically hard problem and the compiler simply can't do it in some cases, in which case we will have to provide type information to cue the compiler --- OCaml programmers will be used to this. But it's enormously useful nevertheless. It allows us to write code as if ew're in a duck typing language like Lua with all the safety of a statically typed language like C++. It particularly comes into its own when refactoring, as we can rapidly change the types of my objects by simply changing a few top level interfaces and everything else follows on automatically. And the simple advantage of reducing the amount of noise in my code as we remove a lot of useless type declarations is surprisingly worthwhile.

In other words, less work with more flexibility, improved maintainability and improved safety. Another quadrupal win!

The holy trio

So here's what these features give us:

  • interfaces allow a fine-grained mechanism for decoupling the function of a type from its implementation.
  • generics provide a formal, robust way of describing and implementing algorithms that work on sets of related types.
  • type inference lets us ignore most of this formalism and instead concentrate on solving the problem at hand.

The three features dovetail beautifully. The additional formalism and more cumbersome syntax imposed by generics are mediated by type inference, which allows us to omit information when the compiler can figure it out for itself. Interfaces allow us to name and specify sets of operations and so provide information to the type inference engine and generics system about what types do.

The end result is a language that, to the user, is much more expressive as we can more precisely define what problem we're solving; much more concise, as we can solve our problem without needing as much boilerplate; much more robust, as our solution can be applied to any data set that's compatible; much faster to code in, as we can concentrate on the abstract problem without babysitting the compiler; and easier to extend, as we automatically get code reuse where the interfaces are compatible. In other words, better.

What it's not, alas, is easier to implement. This kind of language is really hard to write a compiler for, and I don't know any mainstream procedural language that supports this sort of thing. Up until now type inference, in particular, has been largely the province of the functional language world --- I stole most of the concepts above from OCaml. Compiling generics efficiently is another hard problem; there's a nasty tradeoff between slow code that works with all types (which is what Java does) and producing multiple copies of fast code for each type (which is what C++ does).

But these problems have been solved. The functional language world came up with a decent theoretical basis for parametric polymorphism long ago, and have been playing with type inference solvers for decades. JITs are really good at coping with generics, because they will produce fast code on an as-needed basis; conventional compilers have to produce all their code ahead of time. There's no reason at all why someone couldn't produce a modern, mainstream procedural language that supported the features I have described above.

This is why I wish Go were that language. Having such a thing would be a genuine step forward in the Art; as it stands it's just a stumble sideways. The new language would fall right into the same ecological niche that languages like Python and Ruby would, but also be type-safe and much more robust. And if they could also support some of the other functional-language oddities such as higher-order functions and currying, that would be even better!

But unfortunately not. At this point I strongly suspect that Go isn't going to become such, as I'd imagine that the compiler would need to be reworked from the ground up to support these features. But I can always hope...

More information

You might want to go and look at OCaml, which is an industrial strength functional programming language with the features described above; it compiles into real machine code and can be used for procedural programming as well, although that's not it's strong point.

Java programmers might be interested in Scala, which is an experimental hybrid programming language producing JVM bytecode. It's got some of these features but isn't as orthogonal as OCaml due to restrictions in the Java virtual machine.

Wikipedia has good pages on generic programming and type inference.

Bugs?

Any obvious mistakes, omissions, or comments? Please let me know! You can use the comment box below or email me directly. Yes, I'm aware that very few of my Go code snippets above actually compile...