cowbel

Published: 2016 November 3

Index

About

cowbel is a minimalist but expressive statically typed programming language which compiles into real machine code via C. It has generics, interfaces, a garbage collector, non-nullable types, an object-oriented system based on interfaces and composition, and features a familiar Javascript-like syntax.

The purpose of cowbel is to fit the niche where hard-core systems languages like C or C++ are too much work, but dynamic or virtual-machine languages like Python or Java are too heavyweight. cowbel programs compile into standalone executables with few dependencies that have comparable size and performance to C.

cowbel is hosted on GitHub.

2012-03-20

Version 0.2 released!

This features lots of major compiler core improvements, the biggest of which is a global type inference engine, which is used by the code generator to eliminate virtual method calls whenever possible. The end result of this is much, much better code --- cowbel now performs close to C for a lot of tasks.

In addition the compiler core is much more orthogonal and robust, and a lot of things that were a bit dubious before (like forward references) are now much more solid.

The runtime library has been extended so it now supports Array<V>, Map<K,V> and Set<V> as standard data structures; File and streaming I/O have been sanitised and should be much easier to use; and there's a new PCRE module providing basic support for regular expressions.

Installation

You can get the cowbel compiler either from the link below, or, if you want a specific version, from the GitHub download site.

Reference

See some example code

Frequently Asked Questions about cowbel

A brief summary of the language

The runtime library reference

Status

cowbel is brand new, and the compiler is still unfinished. Some language features remain unimplemented (such as object composition), and it's rather brittle in a lot of ways (type checking isn't as rigorous as it should be, and invalid programs may either cause the compiler to crash or generate invalid code). In addition, the runtime library is very minimal, as I'm still learning how to write idiomatic code in cowbel. There are no debugging features. There are bugs.

In particular, right now the language design is in flux. Syntax and semantics may change without warning.

So while it is possible, right now, to write useful programs in cowbel, and indeed I would very much like people to, the process is not as smooth as it should be. Currently cowbel is aimed at language hackers.

Some Examples

Hello, world

The original and best example program.

#include "SimpleIO.ch"

println("Hello, world!");

In cowbel, the program body is the main function, so you the only boilerplate you need is the I/O library. (An empty file is a valid cowbel program that does nothing.)

FizzBuzz

FizzBuzz is the classic bottom-level interview question, designed to demonstrate basic competency. The problem is: write a program the displays the numbers from 1 to 100. Except, instead of multiples of three print "Fizz", and instead of multiples of five print "Buzz". For numbers which are multiples of both three and five, print "FizzBuzz".

Here's how to do it in cowbel.

#include "SimpleIO.ch"

for i = 1, 101
{
  if ((i % 15) == 0)
    println("FizzBuzz");
  else if ((i % 3) == 0)
    println("Fizz");
  else if ((i % 5) == 0)
    println("Buzz");
  else
    println(i.toString());
}

(Yes, the 101 is intentional --- for loops take values in the range [from, to), so the upper value is exclusive.

99 Bottles of Beer

Another example favourite.

#include "SimpleIO.ch"

function bottle(n: int)
{
  printi(n);
  if (n == 1)
    print(" bottle");
  else
  print(" bottles");
}

function sing(verses: int)
{
  do
  {
    bottle(verses);
    println(" of beer on the wall,");
    bottle(verses);
    println(" of beer,");
    println("Take one down, pass it around,");

    verses = verses - 1;

    bottle(verses);
    println(" of beer on the wall.");
    println("");
  }
  while (verses > 0); 
}

sing(10);

There shouldn't be any surprises here. Note the Javascript-style functions; these can be arbitrarily nested. Types on function parameters (and on return parameters) are mandatory.

Generics

Cowbel has support for generics in both types and functions. The standard library contains simple implementations of generic arrays and maps which can be used with any (compatible...) type.

#include "SimpleIO.ch"
#include "Map.ch"

var map = Map.New<int, string>();
map.put(1, "one");
map.put(2, "two");
map.put(3, "three");
map.put(4, "four");

function lookup<K, V>(map: Map<K, V>, key: K)
{
  var value = map.get(key);
  println("The value of " + key.toString() + " is " + value);
}

lookup<int, string>(map, 3);
lookup<int, string>(map, 1);
lookup<int, string>(map, 4);
lookup<int, string>(map, 2);

qsort

Here's a more complex example that combines interfaces, generics, nested functions, recursion, and a few other features. This is an implementation of qsort which will sort any object that implements Array<T> for any T which implements the < operator. (This is the code that's in the standard library.)

function Sort<T>(array: Array<T>)
{
  function swap(a: int, b: int)
  {
    var t = array.get(a);
    array.set(a, array.get(b));
    array.set(b, t);
  }

  function sortlet(left: int, right: int)
  {
    if (left < right)
    {
      var pivot = array.get((left + right) / 2);
      var ln = left;
      var rn = right;
 
      do
      {
        while (array.get(ln) < pivot)
          ln = ln + 1;
        while (pivot < array.get(rn))
          rn = rn - 1;

        if (ln <= rn)
        {
          swap(ln, rn);
          ln = ln + 1;
          rn = rn - 1;
        }
      }
      while (ln <= rn);

      sortlet(left, rn);
      sortlet(ln, right);
    }
  }

  var lo, hi = array.bounds();
  sortlet(lo, hi-1); 
}

The FAQ

Why did you write cowbel?

In 2009, Google announced a shiny new programming language, Go. I loathed this on sight, and gained my 15 seconds of internet fame with a badly written essay comparing Go unfavourably to Algol-68.

(If you're interested, here's a link.)

A week or so later, I wrote another essay, putting forward some ideas about what kind of language Go should have been. Nobody read it.

(Please?)

Since then, I decided to put my money where my mouth was and actually implement that language. Programming languages are harder than they look, and it took me several tries, but cowbel is that language.

Why is it in Java? Shouldn't all serious compilers be self-hosting?

Yes, I admit it. This is a total copout. It's just that Java's tooling is so superb that given how much redrafting the cowbel codebase is getting, it doesn't make sense to use anything else right now. Sorry.

Maybe once cowbel stabilises and gets some decent library support I'll rewrite the compiler in cowbel.

What sort of niche is cowbel aimed at?

It's aimed squarely at the narrow gap between hard-core low-level languages like C and C++ and the much more heavyweight VM-based languages like Java or Python. So it compiles into real machine code... but it's got a garbage collector. It's object based... but has no reflection.

It's intended to produce small, relatively standalone executables for use in systems programming. You wouldn't write an operating system kernel in it, but it's highly suited for daemons.

What's with the operator precedence?

If you're used to Algolalikes, cowbel's operator precedence may come as a surprise. This is because technically, cowbel has no operators.

The table of precedence is as follows:

lowest   infix operators
         prefix operators
         method calls or function calls
highest  parentheses, constants, identifiers

This means that all infix operators have the same precedence, and are therefore evaluated left-to-right.

The reason for this is that cowbel treats all operators as method calls. The language itself has no knowledge of what the operator means, and therefore cannot, for example, parse * at higher precedence than +.

Why can't I create null pointers?

Null pointers are now generally reckoned not to be a good idea. They add failure case that the programmer needs to think about to every single pointer dereference in the program. For a pointer-centric language like cowbel, where all variables are pointers, I don't think this is a good idea.

In addition, supporting null pointers leads to an unpleasant degree of non-orthogonality to the language: why should some types be allowed to be set to null while other types (such as primitive types) can't? This makes the language much harder to reason about and adds nasty edge cases. For example, we can't infer the type of null.

There are situations where you genuinely, really need a pointer that can be unset. Cowbel provides the Maybe to meet this need.

Do cowbel generics use code replication or type erasure?

Code replication. While it does involve generating more code, it's basically less trouble and avoids the need to do explicit upcasts. (Currently cowbel doesn't support upcasting. Anywhere.)

At the moment the type and function inflation is rather conservative and will produce multiple copies of identical functions in places where it really ought to be producing just one copy. This needs attention, but as it's just an optimisation and not an actual language bug, I'm letting it pass for now.

Odd stuff happens when I try to use a variable before I declare it.

In cowbel, all symbol and type declarations are hoisted to the top of their scope. This is to allow forward declarations to Just Work.

function f1() { f2(); }
function f2() { f1(); }

This also has some slightly counterintuitive consequences.

print(i); /* valid! */
var i = 1;

However, currently the type inference algorithm is a bit shoddy and there is no dataflow analysis, so what actually happens is (a) you get an error telling you that the compiler was unable to infer the type of i and (b) even if it could it shouldn't let you do the above because you're using i before it is initialised.

Currently these areas are very rough around the edges. File bugs!

I'm trying to compare two objects and it's not working.

There are no automatic methods on interfaces. If you don't declare your interface specifically to support the == and != methods, you won't be able to compare objects of that interface.

type MyInterface =
{
  function == (other: MyInterface): boolean;
  function != (other: MyInterface): boolean;
};

var o1: MyInterface = ...;
var o2: MyInterface = ...;

if (o1 == o2)
  print("Yes!");

Yes, you do need to implement both methods; cowbel doesn't know what any method means, and so doesn't know that they are inverses.

There will eventually be a set of Comparable<> interfaces to make it harder to get this wrong by accident, but they're not there yet.

I've just got this totally incomprehensible error message.

Yeah, sorry. The error diagnostics are currently really manky. They need a lot of work. There should be enough in there to at least let you find the line number where things went wrong.

Any messages referring to something like functionName<1>(2) represent a function signature: the numbers indicate how many type and value parameterrs the function takes.

Likewise, messages like typeName<1> indicate type signatures.

Anything like Interface42 or {17} refer to anonymous interface and class types. Getting human-readable names for this is a priority.

A long string that looks like int=int boolean=boolean ...long stuff here... filename.cow:123.4 indicates a specific function instantiation. The sequence at the beginning is the type environment for the instantiated function, and the location at the end is where it was defined.

File bugs!

I've got code that shouldn't compile, but does. / My program fails at the C compilation stage.

Cowbel's type checker works lazily, and only type checks code if it gets used. (This is a consequence of the way functions are inflated.) This means that pretty much anything goes in unused code.

In particular, if an object constructor declares that it implements an interface, it is only checked to make sure that it actually implements the methods in the interface when those methods are called. Which means that if you never call them, they never get checked for...

This is not optimal, and overhauling the type checker is on my list of things to do.

There are also a few edge cases where invalid code of this kind can interact with the dataflow analyser (or rather, lack thereof) and produce invalid output files. As an example:

type Interface =
{
};

function f(): Interface
{
  /* this should not be accepted by the compiler */
}

f();

What's this extern thing?

The extern keyword is used for the C call-out interface. It produces a quick and easy way to interface with external libraries. It's not documented because it's hacky and I'm still not sure it's the right way to do things.

It comes in two varieties:

extern "#include ...";

If this statement is seen in reachable code then the string constant is emitted at the top of the output file.

extern "...C statement...";

Variable references in the string constant are expanded and the entire line of code emitted into the output file. A variable reference is a substring of the form ${variable}; this will be replaced by a C lvalue to the variable's storage. Both local variables and upvalues can be used. Expressions can not.

For example:

function kill(pid: int, sig: int): (result: int)
{
  extern '#include <sys/types.h>';
  extern '#include <signal.h>';
  extern '${result} = kill(${pid}, ${sig});';
}

If the variable is a primitive type, then the lvalue will be to the equivalent C type. Object references become typed pointers. Cowbel strings become pointers to objects of type s_string_t; the runtime function s_string_cdata() will extract a nul-terminated C string from them.

function mkdir(dir: string, mode: int): (result: int)
{
  extern '#include <sys/stat.h>';
  extern '#include <sys/types.h>';
  extern '${result} = mkdir(s_string_cdata(${dir}), ${mode});';
}

In addition, the special primitive type __extern is available. This represents a C void pointer. There is a special hole in the type rules which means that it can be initialised from an integer; but note carefully that such an initialisation will not actually change its value. This is used to store C pointers in cowbel objects.

function Buffer(size: int): Buffer
{
  var ptr: __extern = 0;
  /* At this point, ptr is declared but contains an undefined value. */
  extern '${ptr} = malloc(${size});';
  ...etc...
}

Why am I getting strange 'cannot unify type' errors with this code?

Do you have code that looks like this?

var o = { implements Interface; };
var o1: Interface = o;
var o2 = o;
o2 = o1;

What's happening here is that the type of o is being inferred to be that of an object constructor; which is an anonymous interface (let's call this C). This can be implicitly downcast to an Interface, as is happening in the second line.

o2 gets inferred to be a C as well. But this means that the last line is trying to assign an Interface to a C, which isn't allowed.

To fix this, change the first line to:

var o: Interface = { implements Interface; };

This will ensure that the implicit C gets downcast to an Interface before assignment, which will cause the type of o to be inferred as an Interface and not a C.

The language

Introduction

Cowbel is a block-structured language with a syntax deliberately modelled on Javascript. There should be few surprises. (But see the operator precedence table in the section on expressions below.)

Unlike Javascript, semicolons are mandatory, but only in certain statements. (Statements that terminate with a substatement, such as while, do not require a terminating semicolon.)

Preprocessing

Cowbel files are run through the C preprocessor. As such, standard comments using /* ... */ and // will work, as do include files with #include.

C macros may also be used, but are not recommended.

Statements

The following control-flow statements are supported:

if CONDITION POSITIVE-STATEMENT
if CONDITION POSITIVE-STATEMENT else NEGATIVE-STATEMENT
while CONDITION LOOP-BODY-STATEMENT
do LOOP-BODY-STATEMENT while CONDITION;
for VARIABLE = FROM-EXPRESSION, TO-EXPRESSION [, STEP-EXPRESSION] LOOP-BODY-STATEMENT
continue
break

continue and break work in the obvious manner.

Note well! The range in for is treated as [from, to). That is, the target number is exclusive. And the loop uses equality to compare against this target number. This means that the following code will loop forever:

for i = 0, 10, 8
{
  print("i will never be equal to 8");
}

Arbitrary jumps are supported using goto and labels:

LABEL:
goto LABEL

Jumps are allowed only within the current scope or to an enclosing scope, and then only within the current function.

Variables can be declared anywhere.

var VARIABLE-NAME: TYPE = INITIALISER-EXPRESSION
var VARIABLE-NAME = INITIALISER-EXPRESSION

The type declaration is optional. If not type is specified, the variable's type is inferred from the initialiser. The initialiser is mandatory.

Multiple variables may be declared, and then initialised from an initialiser list.

var VAR1: TYPE1, VAR2: TYPE2, VAR3: TYPE3 = INIT1, INIT2, INIT3
var VAR1, VAR2, VAR3 = INIT1, INIT2, INIT3

Note that this is not how Javascript handles things.

Assignments in cowbel are considered to be statements, not expressions.

VAR = VALUE-EXPRESSION
VAR1, VAR2, VAR3 = VALUE1, VALUE2, VALUE3

When assigning to multiple values, the values are all evaluated before assignment, to allow this construct:

a, b = b, a

Multiple values may be returned from functions and methods:

VAR1, VAR2, VAR3 = FUNCTION-NAME(ARGUMENTS...)
VAR1, VAR2, VAR3 = OBJECT-EXPRESSION.METHOD-NAME(ARGUMENTS...)

The number of output variables must be equal to or fewer than the number of values returned by the function or method. Extra return values are discarded.

In addition, there are statements for declaring functions and types, which are described below; and a bare expression may also be used as a statement.

Functions

Functions may be declared as follows:

function FUNCTION-NAME(INPUT-ARGUMENTS): (OUTPUT-ARGUMENTS)
   FUNCTION-BODY

For example:

function f1(i1: int, i2: int): (o1: int, o2: int) {}
function f2(): () {}

Values are returned by assignment to a named output value.

function f(): (o1: int, o2: int)
{
  o1 = 1;
  o2 = 2;
}

The return statement can be used to exit a function early.

function f(): (o1: int)
{
  o1 = 1;
  return;
  o1 = 2; /* not reached */
}

As a special case, return can also be used to specify a value for a function that has exactly one output value.

function f(): (o1: int)
{
  return 1;
}

As a special case, functions that return no values may omit the output argument list. These are equivalent:

function f1(i1: int): () {}
function f2(i1: int) {}

As a special case, functions that return one value may use an abbreviated output argument list. These are equivalent:

function f1(i1: int): (i2: int) {}
function f1(i1: int): int {}

In this situation, the output argument becomes anonymous, and can only be assigned to with the second form of the return statement, described above.

Expressions

Expressions consist of a set of nested function or method calls, grouped by parentheses. The set of precedence is as follows:

lowest   infix operators
         prefix operators
         method calls or function calls
highest  parentheses, constants, identifiers

Note that this means that all infix operators have the same precedence, which means that his expression:

1 + 2 * 3

...is evaluated strictly left to right and therefore may not produce the result you expect.

Operators become ordinary method calls. Infix operators call the named method with no arguments; prefix operators with one argument.

Traditional method calls have the following syntax:

OBJECT-EXPRESSION.METHODNAME(ARGUMENTS...)

There is (currently) no short-circuiting inside expressions.

Types

cowbel supports the following primitive types.

boolean    true/false boolean
int        32-bit integer
real       64-bit floating point
string     immutable UTF-8 string

It is possible to assign type aliases with the type statement.

type NEWNAME = OLDNAME;

In a type context, a {...} block defines a new interface type.

type NAME = { INTERFACE-STATEMENTS... };

Currently the only interface statement supported is a method declaration, which is the same as a function declaration with no body. For example:

type HasString =
{
  function toString(): string;
};

Note the trailing semicolon above!

The only implicit type conversion supported is from an interface to a subinterface.

Operator methods are defined in the obvious way:

type HasAddition =
{
  function + (other: HasAddition): HasAddition;
};

Constants

Integer constants can currently be only in decimal.

Real constants must contain a decimal point (to distinguish them from integer constants).

Boolean constants must be either true or false.

String constants must be delimited with either '...' or "...". The following escape characters are supported:

  • \n = newline (ASCII 10)
  • \r = carriage return (ASCII 13)
  • \", \', \\ = ", ' or \, respectively

Objects

Cowbel considers scopes and objects to be the same thing: a block scope is simply an object constructor which discards its result, while an object constructor is a block scope where the result is recorded. Functions in the scope become methods and variables in the scope become instance data. Other statements in the scope are executed in the usual fashion when the object is created.

For example:

var object =
{
  function greet()
  {
    print("Hello, world!");
  }
};

object.greet();

An object constructor returns a value which has the type of an anonymous interface implementing all the methods in the scope. This type has no name and cannot be referred to.

The object can be declared to implement one or more named interfaces using the implements keyword.

implements TYPE;

For example:

var object =
{
  implements HasString;

  function toString(): string
  {
    return "object";
  }
};

var asinterface: HasString = object;

It is possible to use the implements keyword in a block scope, but not useful.

Scoping

Variable, function and type declarations in cowbel are hoisted to the top of the scope in which they are defined. This means that no special action is needed to forward declare functions.

That said, referring to variables before they have been initialised can be problematic.

print(i); /* i's value is invalid */
var i = 1;

[Note: the above code should be rejected by the compiler; but currently there is no dataflow analysis, so it's accepted. Please don't do this.]

Variables may be referred to and modified in nested functions and objects in their scope. True upvalues are supported.

The runtime library

Introduction

The cowbel runtime library is a work in progress and mainly exists to support the test suite. If you actually want to write a real program in cowbel, you're advised to copy the files you want to protect yourself from future changes.

Currently the actual library is so much in flux that the only sensible documentation is in the header files themselves. Formal documentation here will follow once I've written a documentation generator.

Application.ch

Array.ch

Buffer.ch

File.ch

Map.ch

Maybe.ch

PCRE.ch

Set.ch

SimpleIO.ch

Stdlib.ch

StreamIO.ch

Primitive types

Unlike cowbel 0.1, primitive types are now represented by interfaces and are described in Stdlib.ch above.