I recently got nerd sniped by fixed point Mandelbrot rendering, which culminated in a toy Mandelbrot program for the BBC Master which can render a 90x180 Mandelbrot in twelve seconds, sort of. This is running on a 2MHz 65C02 with 128kB of RAM.
What is it?
So… it’s a complete stunt. The algorithm will only render the above picture, it can’t pan or zoom, it can’t show conventional colour contours, there are bounding problems, etc, etc. But it’s pretty fun.
It’s based on a blog post on Nothing Untoward where the author writes an 88-byte Mandelbrot renderer for 80186 PCs. I needed one for a two-second clip in my videogames on the IBM PC 5140 video, but this one wouldn’t run on an 8088, so I had to hack it around a bit. It’s rather elegant; it uses 9.7 fixed point, carefully scaled so that two numbers multiplied together cause an overflow if the result is greater than 4; this is important, because that allows very cheap bailout detection by simply checking the overflow flag.
Then, after a conversation on the vcfed forums I got thinking about whether it would be possible to use 2.6 fixed point instead. This would allow values to be stored in a single byte. You lose a lot of precision, but in turn you get to, e.g., use a 64kB brute force multiplication lookup table, which drastically improves the performance.
This program is the result.
The image is 90x180, which makes for 16200 pixels, which means it’s about 1500 cycles per pixel on average.
How does it work?
It combines several fairly obvious techniques.
64kB multiplication lookup table (stored in the BBC Master’s sideways RAM). This lets any two fixed point values to be extremely cheaply multiplied together by simply looking them up. The fixed point adjustment shift is baked in, and overflows are signalled with the value
0x7f(the largest number representable, 1.984375).
recursive subdivision. Any area in a Mandelbrot which can be completely surrounded by a given colour can be filled in that colour, because I know that there are no features inside (with the one exception that the set as a whole sits in an infinite sea of colour). This lets me avoid lots of work, in particular most of the lake, which is very slow.
pixel cache. I use video memory itself as a cache of which pixels I’ve already calculated, so I don’t need any additional data storage to make the recursive subdivision work. (The screen mode I’m using allows me to define two different kinds of black, which I can use to distinguish between actually-black and not-rendered-yet-black.)
top/bottom mirroring. The Mandelbrot set is symmetric across y=0, so I can render the top half and just copy the image to the bottom. This was much more fiddly than it looks.
It also gleefully uses the extra 65C02 instructions — pointer indirection without needing to waste an index register on loading a 0 is so helpful…
Unfortunately there are a lot of limitations.
miniscule precision. You can’t zoom in, because there’s no data there! The image is 256 pixels high (128 wide, because MODE 2 pixels are 2x1), and my numbers can only represent 256 different numbers.
wraparound and overflow issues towards the edge of the image. Part of the rendering process involves squaring the X and Y coordinates. As my number format can only represent -1.984375..1.984375, then any coordinate where will cause an overflow. So the resulting useful image is only 180 logical pixels square (and so chops off the The Needle. And even within those bounds, there’s numeric wraparound which I couldn’t be bothered to figure out how to detect (resulting in the artifacts around the edge of the image).
weird contours. Normally when rendering a Mandelbrot one tests for bailout by testing for . But I can’t do that, because I can’t represent 4s. Instead I use Nothing Untoward’s trick and test and individually for overflow. This works absolutely fine for testing the bounds of the Mandelbrot Set itself, the black lake, but it leads to weird-looking colour contours.
As an aside, I did try using , because I can just add the two values together and look at the overflow flag, but I get this:
…which is interesting, but not really what I wanted.)
Can it be improved?
It’s probably possible to shave a bit off the render time, but I doubt it would be worth it — obvious places to start are faster rectangle fills, and more intelligent subdivision to reduce the amount of work needing done. Rendering in MODE 1 would probably lead to more pleasing images due to the higher resolution but I’d be limited to three colours, and the conversion wouldn’t be trivial, so I’m not going to bother.
I don’t think the byte-sized fixed point numbers are going to lead anywhere interesting; there’s simply not enough precision to be worthwhile. The next step up is 16-point fixed point, but that’s substantially more complex (and therefore much slower) on an 8-bit processor like the 6502.
Anyway, this was fun to do and I hope it’s interesting to read about, and if anyone can do anything cool with the code, send me a pull request!