don't click here

Duff's device

Discussion in 'Engineering & Reverse Engineering' started by Aurochs, Feb 1, 2007.

Thread Status:
Not open for further replies.
  1. Aurochs

    Aurochs

    Единый, могучий Советский Союз! Tech Member
    2,343
    0
    0
    Whatever catches my fancy
    Optimizing loops can often mean unrolling them. If the number of iterations is known at compile time, one could simply write the whole thing out (say, twenty <tt>move</tt> instructions rather than one <tt>move</tt> and a <tt>dbf</tt>). However, if you find yourself optimizing a loop, it may have hundreds of iterations, and completely unrolling it could inflate the module massively. In such cases, a good compromise is to partially unroll it (twenty <tt>move</tt> instructions and a <tt>dbf</tt> rather than 100 <tt>move</tt>s).

    Things get more complex when the number of iterations isn't known until runtime. One must decide how to handle partial iterations. Usually the best way to do this in assembly is to construct a jump table and use it to branch into the middle of the loop, but the common practice in C was to write a whole bunch of different partial loops and choose the correct one at runtime.

    Tom Duff found a rather curious method to create the assembly construct in C ?€” he interlaced a switch structure with a do loop.
    Code (Text):
    1.     send(to, from, count)
    2.     register short *to, *from;
    3.     register count;
    4.     {
    5.         register n=(count+7)/8;
    6.         switch(count%8){
    7.         case 0: do{ *to = *from++;
    8.         case 7:     *to = *from++;
    9.         case 6:     *to = *from++;
    10.         case 5:     *to = *from++;
    11.         case 4:     *to = *from++;
    12.         case 3:     *to = *from++;
    13.         case 2:     *to = *from++;
    14.         case 1:     *to = *from++;
    15.             }while(--n>0);
    16.         }
    17.     }
    This is perfectly valid C, and comiles and runs exactly as expected. The switch is only tested once, and is used to jump into the middle of the loop. Execution will continue through the rest of the switch until it reaches the loop's while() statement, at which point control is transfered to case 0: if it tests true.

    Now, if you're writing your Genesis games in C (compilers do exist), this may be perfect. But us hackers will have to use assembly. So I decided to hack up a quick and dirty subroutine in 68k that should be roughly equivalent to the above C. (Note: errors probably exist)
    Code (Text):
    1. ; An implimentation of Duff's device for the 680x0 processor family.
    2. ; d0 = count
    3. ; a0 = destination pointer
    4. ; a1 = source pointer (array)
    5.  
    6. SECTION duffdev
    7. PUBLIC  duff_device
    8.  
    9.  
    10. duff_device:
    11.     cmpi.w  #0,d0
    12.     beq.s   return;count is zero; no need to go any further
    13.  
    14.     move.w  d0,d1
    15.     addi.w  #7,d1
    16.     lsl.w   #3,d1
    17.     subi.w  #1,d1;d1 contains n (for loop iterations)
    18.    
    19.     lsl.w   #3,d0
    20.     lsr.w   #1,d0;lsb must be clear for jump table to work
    21.     move.w  jumpTbl(pc,d0.w),d2
    22.     jmp jumpTbl(pc,d2.w)
    23.  
    24. ;******************************************************************
    25. jumpTbl:
    26.     dc.w    case0
    27.     dc.w    case1
    28.     dc.w    case2
    29.     dc.w    case3
    30.     dc.w    case4
    31.     dc.w    case5
    32.     dc.w    case6
    33.     dc.w    case7
    34. ;******************************************************************
    35.  
    36. case0:  move.b  (a1)+,(a0)
    37. case7:  move.b  (a1)+,(a0)
    38. case6:  move.b  (a1)+,(a0)
    39. case5:  move.b  (a1)+,(a0)
    40. case4:  move.b  (a1)+,(a0)
    41. case3:  move.b  (a1)+,(a0)
    42. case2:  move.b  (a1)+,(a0)
    43. case1:  move.b  (a1)+,(a0)
    44.     dbf.s   d1,case0
    45.  
    46. return: rts
    47.  
    48. ENDS    duffdev
    Read more about Duff's device here.
     
  2. Evil Cheese

    Evil Cheese

    ..... Member
    296
    0
    0
    Thanks! This is a nifty little trick. :(
     
Thread Status:
Not open for further replies.