don't click here

Optimized KosDec and NemDec, considerably faster decompression

Discussion in 'Engineering & Reverse Engineering' started by vladikcomper, Nov 4, 2013.

  1. Clownacy


    Tech Member
    If you're that desperate for cycles in KosDec, you could unroll the descriptor field loop (removing the need for the 'dbra d2,@skip\@'). Sonic 2 Nick Arcade had a "Chameleon" decompressor that had that done to it.
  2. MarkeyJester


    Original, No substitute Resident Jester
    You can speed up your bit-field reading by a few more cycles still, by replacing;

    Code (Text):
    1.         move.b  (a0)+,d1                        ;  8 -.
    2.         move.b  (a4,d1.w),d0                    ; 14  :
    3.         lsl.w   #8,d0                           ; 22 -' d0.8-15  <- [a0++] (x-mirror, process left to right)
    4.         move.b  (a0)+,d1                        ;  8 -.
    5.         move.b  (a4,d1.w),d0                    ; 14 -' d0.0-7   <- [a0++] (x-mirror, process left to right)
    Code (Text):
    1.     move.b  (a0)+,d1            ;  8
    2.     move.b  (a4,d1.w),(sp)          ; 18
    3.     move.w  (sp),d0             ;  8
    4.     move.b  (a0)+,d1            ;  8
    5.     move.b  (a4,d1.w),d0            ; 14
    You'll have to push the stack back by a word before starting decompression, and push it forwards again after decompression is finished, but this'll allow you to shift the contents of d0 8 pixels left, without actually shifting, and saves a few cycles.

    To be fair though, the only real speed optimisations which you could perform on the routines, are mostly those that will cost memory to compensate, ala look up tables and unrolled loops (just like the method Clownacy mentioned), I think these routines are generally reaching the peak whereby; it's a matter of choice (the old speed vs size syndrome), and given them to cost memory, rather than actually saving memory (which is what the compression is designed specifically to do), I guess it really depends on context of the game itself, do you need the speed more than the size? (I know I'm the one to talk given my compression requires a shit load of look up tables (practice what you preach right?), but still, I felt it worth noting, it could be important).
  3. carljr


    Hey, this is a really nice trick! I'll have to implement that. I guess you could also just use (a1) instead of (sp), assuming there is at least 1-byte left to output, which there would have to be. My point is you would not have to worry about putting something on the stack and popping it off later.

    Are there any other forums or posts about optimizing code and coming up with new tricks? It works well for me when studying M68K programming.

    Thanks again, Carl
  4. nineko


    I am the Holy Cat Tech Member
    You can take a look at GenDev. And if you'll ever want to learn Z80 too, here is smspower.
  5. MarkeyJester


    Original, No substitute Resident Jester
    Sorry, but I'm afraid you cannot use (a1) as it could be on an odd address at any point, the 68k is not able to load word data from an odd offset, it will trigger an address error exception. I chose the stack and the sp, simply as the other registers are in circumstances in which they cannot be used in such a manner. You could use a previously unused address register, but you'd need to store the contents of that address register onto the stack, and point the address register to a rewrite-able address, which would probably take longer than simply moving the stack back and using that.

    Good thinking though, I admire that you're thinking and trying to find more efficient ways around it =)
  6. Cyber Axe

    Cyber Axe

    Just tried using these in Sonic 1 Github, Kosinski works fine it seems but Nemesis doesn't, Vlads worked fine when i replaced the following references:

    NemDec3 Replaced with NemPCD_NewRow
    NemDec4 Replaced with NemDec_BuildCodeTable
    NemDec_WriteRowToVDP Replaced with NemPCD_WriteRowToVDP