As it’s the 50th anniversary of the Apollo 11 moon landings, I thought this would be a great opportunity to live code a demonstration of using my new compiler to write a simple lunar lander game… but first, I’m going to port the compiler to generate code for an actual lunar lander. Yes, my target platform is the Apollo Guidance Computer used by the Apollo program. And it is weird.

I’m actually using the Virtual AGC simulator for the AGC, as there are very few running AGCs in the world. One, to be precise. There may be one other functional AGC somewhere in interplanetary space, but as it hasn’t been powered on for over fifty years it could probably use some work. Plus, access is hard.

It’s bizarre: 15-bit words, 1s-complement arithmetic, asynchronous logic mostly implemented with discrete logic gates wired together by hand, ferrite core memory and ROM, and the weirdest instruction set you’ll ever find in a mainstream computer. (Is the AGC mainstream? I SAY YES.) It was a miracle of miniaturisation, and its development is largely responsible for the integrated circuit revolution: at one point, apparently, 25% of all ICs produced in the world were going into the AGC development programme.

That said, it is a proper von Neumann architecture computer, capable of self-reprogramming and executing code out of RAM. In many ways it feels startlingly modern. It was interesting (if a little frustrating) to port to.

So, the game?

lunarlander.agc.bin.gz 1 kB

The compiled binary for the game.

First, download and decompress the binary. Then hand-weave into core rope memory and assemble into an AGC module. Locate a running AGC and insert it into the common-fixed bank 2 slot. Attach a set of LM simulated peripherals to the AGC and power on. If you have a real LM, all the better.

To play, press ENTER. The DSKY will show you (top to bottom) your current thrust level, your altitude, vertical velocity, and remaining fuel. Use the pitch control of the Rotational Hand Controller (a.k.a. the joystick) on the LM to control your throttle — push forward for more thrust, center for less thrust. Pay attention to the altitude and velocity alarm lights; if the velocity alarm is lit, you’re going too fast to land safely! Once landed (or crashed) press ENTER to reset and then ENTER again to replay.

(Note. Not certified for flight use. Do not attempt to land your LM on the moon while having the lunar lander game loaded into the AGC. Load Luminary, the real lunar lander software package, instead.

A DSKY simulation screenshot.

Alternatively, if you don’t have an AGC or LM handy to run it on, you can use Virtual AGC instead. Virtual AGC is a little complex to set up, so refer to the instructions there; you’ll need the ACA and DSKY simulators, and yaAGC must be configured in LM mode rather than CM mode. (Because the command module didn’t have a joystick.)

All the source code for the compiler and the game itself are available from my github repository below. It’s all BSD 2-clause licensed.

The compiler source

Find the compiler source here. It'll probably be a pain to make work.

Some notes on the machine

So this was way more work than I thought. I was originally expecting to get this done in a few hours. Turns out the AGC instruction set was odder than I expected.

The overall architecture

So, by modern eyes, the AGC is essentially a 15-bit word system with three registers and a 12-bit address bus, meaning it could address only 4096 words of memory (15-bit words). It had 2048 words of RAM and a staggering 36,864 words of ROM, both implemented using ferrite core technology, and a complex banking system for accessing it all. There was also a 9-bit I/O address bus and instructions for accessing it.

In many ways it felt very modern, with an extremely familar-looking interrupt system and counters; in some ways it didn’t, such as the asynchronous logic implementation which meant that each instruction executes in a variable amount of time depending on how much work it was doing, or the way the registers are all memory mapped, so the accumulator can be read from address 0. And there is no support for a stack at all.

The instruction set is elegantly RISC, with each 15-bit word encoding a 3-bit opcode and 12-bit address payload (constants were always referred to by address). However, it clearly had been expanded many times during development, so some instructions steal some bits from the top of the payload to allow an expanded opcode and as a result can only address the bottom 1024 words of memory. Then they still ran out of opcodes, so added an EXTEND instruction as a prefix word, allowing two-word instructions.

Needless to say, the AGC can’t handle bytes at all.

(It’s a very interesting machine. I think that maybe a cleaned up version of the ISA, expanded to 32 bits, would make a nice very-low-transistor-count embedded CPU. You might possibly want to switch to 2s-complement, though.)

Registers and instructions

The instruction set is strange.

The three registers are A, the accumulator; L, the lower accumulator, used for the other parameter of three-word operations; and Q, the return address. ARM and MIPS users will recognise this as a link pointer. Subroutines are called with a relatively normal TC label, which stores the return address in Q, and returned from with RETURN, which just jumps to Q. To call nested subroutines, you have to store Q somewhere. The A and Q registers are special in that they are 16 bits wide rather than 15. The extra bit was used to track arithmetic overflows.

Some instructions are fairly normal, if oddly named: CA label loads the value at that address into A. Some aren’t: the main data storage instruction is TS label, which stores A… but it’s also a jump instruction; if A has overflowed, it skips the next instruction. This allows:

    TS something
    TCF no_overflow_occurred
    <handle overflow here>

(TCF is the normal jump instruction.) Obviously if you can guarantee that the value cannot overflow, you don’t need the jump instruction. This idea makes me really uncomfortable as a source for subtle bugs, because if it did overflow, your program would just skip one instruction and keep running, with hilarious results…

Then there’s the CCS instruction, which is the magnitude comparison and loop instruction. This loads its parameter into A, and then jumps to one of the four following instructions based on the value, but with A diminished. (Diminished means: moved one integer closer to zero. So, 5 becomes 4, but -5 becomes -4.)

    CCS variable
    TCF greater_than_zero
    TCF positive_zero
    TCF less_than_zero
    TCF negative_zero

(1s complement arithmetic allows signed zeroes.) Looking at the Luminary source code, I see situations where they could guarantee that variable was always greater than zero or positive zero, and so they skipped the other jumps. Like with TS, that makes me very uncomfortable… but it does allow for nice tight loops:

    CA memory_block_size   ; A := size of block that we want to clear
loop:
    ZL                     : L := 0
    INDEX A                ; add value of A to next instruction
    LXCH block             ; swap L with *(block+A) (this is the only way to store L without touching A)
    CCS A                  ; A-- and depending on old A jump to...
    TCF loop               ; if A *was* >0, loop again
    ...                    ; if A *was* +0, jump here and exit the loop

What’s that INDEX instruction, you might ask? Read on.

INDEX

The instruction set has no provision for pointer handling. You can’t (normally) dereference them and the assembler doesn’t (normally) allow you to emit a constant address of something to memory. It’s my belief that the original programmers never used them. The instruction set doe have robust support for look-up tables, via that INDEX instruction, but I don’t think they ever actually took the address of anything, except maybe to do indirect jumps (more on that later).

What INDEX does is read its argument and then execute the next instruction with that value added to the opcode. So, in the example above, LXCH actually operates on the value at block + A rather than just block. This allows things like:

    TC get_key               ; read a key and return it in A
    INDEX A
    CA key_translation_table ; A := translated key
    TC do_something_with_it
    ...
key_translation_table:
    .dw 1
    .dw 2
    .dw 3
    ...

But it’ll work on any instruction, not just CA, and on any value; so you can set the opcode bits too. This allows self-modifying code without needing to modify anything!

    TC compute_instruction   ; get an instruction opcode in A
    INDEX A
    .dw 0

I managed to abuse this into allowing pointer dereferences by simply doing INDEX A; CA 0, so adding A to 0 to read from the absolute address in A. Stores worked, too, by doing INDEX A; TS 0. However, trying to actually get the address of anything was hard, because the original assembler didn’t have anything like .dw label. I eventually managed to bodge around this by finding the instruction whose opcode bits were 0. (It was TC.) Emitting this in a data section would end up writing the raw address.

Multiplication and division

The AGC does have hardware multiplication and division. You can multiple two 15-bit numbers to get a 30-bit result, or divide a 30-bit value by a 15-bit value to get a 15-bit result (and 15-bit remainder).

The multiplication instruction is relatively sane, but the division one is strictly limited. It can only compute \(x/y\) when \(|x| > |y|\). Anything else, you get garbage, including in the \(x=y\) case. This hit me several times and trying to deal with all the sign issues, combined with the dealing with 1s-complement sign issues, was really painful. I ended up with a massive hand-written helper routine for dividing two numbers to try to work around it; I’m sure it’s wrong, and I haven’t written the equivalent modules one yet.

1s-complement numbers

Modern machines use 2s-complement numbers:

+2 = 000 0000 0000 0010
+1 = 000 0000 0000 0001
 0 = 000 0000 0000 0000
-1 = 111 1111 1111 1111
-2 = 111 1111 1111 1110

However, the AGC uses 1s-complement instead:

+2 = 000 0000 0000 0010
+1 = 000 0000 0000 0001
+0 = 000 0000 0000 0000
-0 = 111 1111 1111 1111
-1 = 111 1111 1111 1110
-2 = 111 1111 1111 1101

This has two big advantages over 2s-complement arithmetic, in that \(~x = -x\), and that it also allows both positive and negative zeroes. However, it also has two huge disadvantages: the gap between +3 and +1 is not the same as the gap between +1 and -1, …and that it allows both positive and negative zeroes.

I spent a great deal of time wrestling with this, trying to encode opcodes for the DSKY properly. It didn’t help that I got it mixed up with Signed Magnitude Representation:

+2 = 000 0000 0000 0010
+1 = 000 0000 0000 0001
+0 = 000 0000 0000 0000
-0 = 100 0000 0000 0000
-1 = 100 0000 0000 0001
-2 = 100 0000 0000 0010

My several explanations of 1s-complement in the video are simply wrong. Please disregard.

Other quirks

  • The registers are exposed on I/O ports! So reading from I/O port 0 gives you the accumulator.

  • This is useful, because the AGC has no OR or XOR instructions! But you can read from an I/O port and logically or or xor into the accumulator, so this works:

        CA rhs                     ; A := rhs
        TS L                       ; L := A
        CA lhs                     ; A := LHS
        ROR L                      ; A := A | L
    

    (There is a MASK instruction for doing ANDs.)

  • L or Q can’t be directly read from or written to memory! But you can swap them with memory (using LXCH or EXTEND QXCH. (You can swap A, too, with XCH.)

  • There are no shift instructions! But there are memory-mapped shift registers. Write a value to address 22 and it’ll be rotated left. But some instructions (like INDEX) will read and then write back their operand, thus resulting in it being shifted again!

  • There is one instruction called EDRUPT which is so called because it was requested by Ed Smally, and used only by him. It appears to read but then ignore its operand and then fake an interrupt to the next address, first executing the instruction whose opcode is in A. I think. Its purpose is obscure.

  • It is possible to call the accumulator as a subroutine. This writes the return address into Q, at address 2. As the encoding for TC is 0, that address is a valid jump instruction. So, you get to execute the two instructions in A and L and then return to where you were. This is… useful, I guess?

  • This is the only CPU I’ve ever encountered with a dedicated radar interrupt. Plus, I want to know what that memory-mapped register simply called ‘thrust’ does.

Interesting milestones

These are also on the video itself, for clickability.

  • 00:00:00 Introduction
  • 00:04:27 DAY 1 START
  • 00:28:00 I manage to crash VirtualAGC
  • 00:32:24 First subroutine call (with no parameters)
  • 00:35:50 We add our first (and only) type — an int15
  • 00:39:12 Start working on expression handling and the virtual stack
  • 00:45:40 We assign a value to a variable! Pity we can’t represent values yet
  • 00:47:50 Start implementing the constant pool
  • 00:53:50 Actually run code for loading our variable
  • 00:59:15 We can load variables too
  • 00:59:42 Start adding numbers
  • 01:09:30 Planning how to write to the DSKY
  • 01:14:50 Passing parameters to subroutines
  • 01:32:35 Discovery that YUL can’t store addresses as constants
  • 01:36:49 DEBUGGING BREAK — and solve previous problem through creative evil
  • 01:43:30 Subroutine calls with input parameters work
  • 01:47:00 Adventures with multiplication
  • 01:56:30 Wrestling with the DSKY
  • 02:08:15 Actually get something displayed; conditionals and loops work
  • 02:19:40 Setting individual DSKY digits works; start work on outputting numbers
  • 02:30:10 More problems with labels
  • 02:42:15 Wrestling with division
  • 03:22:55 TEA BREAK
  • 03:33:28 First successful division!
  • 03:37:00 Discover that my virtual AGC keeps resetting; interrupt system debuggin
  • 03:48:30 More fighting with the DSKY
  • 03:58:00 Discover that my DSKY parameter generation is overflowing
  • 04:22:35 1s-complement arithmetic is hard (doesn’t help that I got it completely wrong)
  • 04:24:00 Start implementing logic operations
  • 04:33:45 Start go off on a total wild goose chase due to not understanding 1s complement
  • 05:10:00 Realise my mistake and give up
  • 05:24:10 SLEEP. DAY 2 START
  • 05:26:10 Realise I need to keep my own copy of the DSKY state
  • 05:54:35 Revenge of the labels!
  • 06:09:00 Finally, I’m writing numbers to all the DSKY displays! The right numbers!
  • 06:20:20 And now we can display negative numbers!
  • 06:20:55 Start writing the actual game
  • 06:26:00 First attempt to read the Rotational Hand Controller pitch control. This fails
  • 06:45:00 DEBUGGING BREAK — I give up wrestling with the RHC and the DSKY and do some offline debugging. All better now
  • 06:67:12 Writing the main game loop
  • 06:53:30 First run of our prototype game. It’s very broken
  • 07:01:50 Discover a nasty arithmetic bug that just happened not to show up in our tests
  • 07:08:10 The bride of division… from the swamp!
  • 07:28:30 Division works now. Maybe?
  • 07:29:33 Start debugging memory corruption issues
  • 07:35:45 Octal: the cause of, and definitely not the solution to, all of life’s problems.
  • 07:38:30 Our game now looks like a bit like a game
  • 07:40:00 Turns out that 15 bits isn’t enough precision for our game, so fake some more
  • 07:45:15 Our game is now playable, although it needs lots of tuning. Also being able to control the engine would be nice
  • 07:48:05 The player’s throttle control (the RHC) works. It’s a game!
  • 07:52:40 The game has a bit of challenge to it. I still land successfully due to leet skills
  • 07:53:30 Start implementing the new game / replay loop, controlled from the DSKY
  • 08:06:34 You can crash and land and be prompted for a new game
  • 08:08:50 The game is done!
  • 08:09:00 I decide to add one more effect, and immediately run out of block 2 fixed memory space
  • 08:38:14 The moment when I realise I’ve run out of block 2 fixed memory space
  • 08:50:50 The moment when I discover that block 3 fixed memory doesn’t kick in automatically
  • 09:00:33 I manage to shrink the program to just fit in block 2 (using all the available space)
  • 09:00:50 THE GAME IS FINISHED