I’m not sure why, but this and the next puzzle are ones I felt a little bit of an itch to shave off some cycles.
The straightforward way to write this program looks like this:
Where the unit keeps track of what row it’s drawing with a value swapped in and out of BAK
and a loop that sends out the color code for white (3) thirty times. From top to bottom, the unit outputs [0, 0, 3, …, 3, -1] to draw a white line completely across the top row, then [0, 1, 3, …, 3, -1] to draw the next row, and so on. This runs in 2,334 cycles.
Well, the very first optimization in the book for something like this is unrolling the loop:
By using all the extra blank lines and combining the jump label with another line of code, we can cram five extra copies of MOV 3, DOWN
in there, so instead of drawing one pixel thirty times, we can draw six pixels five times. By going through the loop twenty-five less times, we execute the SUB 1
and JGZ LOOP
instructions at the end of it twenty-five less times, and since there are eighteen rows, this cuts out 2 * 25 * 18 = 900 cycles, leaving us with 1,434!
But we can still do a little better than this. There’s still a little bottleneck in incrementing and outputting the row; because this one unit does everything, it has to waste time maintaining this counter variable when we could have another unit do the math in its spare time:
Which takes us down to 1,382 cycles and frees us three more lines for more unrolling! But there’s a problem: we can now make the loop body print out seven, eight, or nine pixels, but none of those evenly divide the row length of thirty, and it would be wasteful to print out extra pixels. What we can do is add two more pixel plotting instructions to the loop and then use the last line of code to start drawing by jumping into the middle of the loop. If we run the loop four times but skip the first two MOV 3, DOWN
instructions during the first loop, then we’ll draw 6 + 8 + 8 + 8 = 30 pixels!
Which runs in just 1,364 cycles!
The story continues in IMAGE TEST PATTERN 2.