I’ve just released version 0.5 of Cowgol, my new Ada-inspired programming language. It’s not just for the 6502 on the 6502, it’s for the Z80 on the Z80 as well! And, if you’re masochistic, for the 6502 on the Z80 and for the Z80 on the 6502 too (I build all the cross compilers).

Main Cowgol page

See the main Cowgol page for more information.

The big new feature is Z80 code generation. Oh, boy. I actually thought this would be easier than 6502 code generation; I was so wrong. (See this blog post where I talk about why, but I found so many other bizarre things.) But I did manage to beat it into workingness, although it needs a massive optimisation pass to make the generated code anything other than shamefully embarrassing, and I’m proud to say that the entire compiler will run on CP/M and generate CP/M binaries. Although it’s not self-hosting (the binaries are just too big).

The Cowgol distribution contains a CP/M emulator which will let you try it from the comfort and convenience of your own PC!

Code generation and the Z80

In essence, the Z80 is so asymmetric that trying to generate good code is beyond Cowgol’s puny little code generator. For example, let’s do a simple 16-bit assignment:

• Is a 16-bit register free? If so, I can just load a into one and write it out again. But remember that HL is the cheapest, BC/DE are a byte longer, and IX/IY the another byte longer than that. So I want to use HL, but that’ll evict something else.

• No 16-bit registers? I can do it with A instead, reading and writing a byte at a time, with ld a, (a+0); ld (b+0), a; ld a, (a+1); ld (b+1), a. Of course, it’s twice as many bytes as the 16-bit version. And you have to use A, it’s the only option (the other eight-bit registers can’t be directly read from or written to memory).

• Wait, I want a pointer instead? Pointers to 16-bit values have to be dereferenced as 8-bit pairs, so ld hl, (a); ld d, (hl); inc hl; ld e, (hl). Using a register other than HL is slower. Using an index register lets us do ld ix, (a); ld e, (ix+0); ld d, (ix+1), which doesn’t corrupt our pointer register, but is even slower than that.

• Want to dereference the pointer into and index register? You can’t. (Without using undocumented instructions.) You have to dereference it into an ordinary register pair and then do hacks to get it into the index register, the cheapest of which is push hl; pop ix.

• Want to do arithmetic as well? 8-bit arithmetic has to happen through A. 16-bit arithmetic (limited to addition and subtraction) has to happen through HL. Except increments and decrements, which can be done to anything. Want to do non-addition or subtraction arithmetic on 16-bit values? Do it a byte at a time in A: ld a, e; or c; ld e, a; ld a, d; or b; ld d, a.

So the answer to the simple question what register do I use for this is it’s complicated. The 6502 was very limited in what it could do, but at least that meant that there was always one clear best approach.

Everyone needs a laugh, so here’s the print() routine in Cowgol:

sub print(ptr: [int8])
loop
var c: int8 := ptr[0];
if c == 0 then
return;
end if;
print_char(c);
ptr := ptr + 1;
end loop;
end sub;


And here is the code, annotated by me (this was actually compiled on an emulated CP/M system):

X010f:  ld      hl,(X0173) --- get ptr
ld      a,(hl)     --- dereference
ld      (X0175),a  --- assign to c
cp      0          --- test
jr      nz,X011b
ret                --- return if end of string
X011b:  ld      a,(X0175)  --- reload c
ld      (X0176),a  --- assign to print_char()'s parameter
call    X0103      --- call print_char()
ld      hl,(X0173) --- \
ld      bc,X0001   --- |
ld      a,l        --- | all this
add     a,c        --- | is just
ld      e,a        --- | adding one
ld      a,h        --- | to ptr
adc     a,b        --- |
ld      d,a        --- |
ld      (X0173),de --- /
jr      X010f
ret


So, yeah, that addition is horrifying — but fixing that is a simple microoptimisation which is next on my list of things to do. The rest of the code isn’t that bad. Just like on the 6502, a peephole optimiser would do wonders. Still, baby steps.

(Note that the code generator doesn’t remember the value of registers between basic blocks. That requires global register analysis and that’s really hard and memory intensive, so it’s much easier not to bother.)