don't click here

Nemesis compression

Discussion in 'General Sega Discussion' started by shobiz, Jun 20, 2010.

  1. I started looking at the Nemesis decompression subroutine in Sonic & Knuckles a few days back, and the results can be seen at http://segaretro.org/Nemesis_compression under the Format Description and Decompression code headings. (I don't know if the decompression code is appropriate for that article actually - it seems useful for reference purposes but the lack of asm tags on Sega Retro spoils it somewhat).

    Anyway, I had a couple of questions which I was hoping somebody here could answer. Bear in mind that I've never formally studied information theory or compression theory in my life, so they might be quite dumb.

    Any idea why Sega would use Shannon-Fano? Huffman coding's been around since 1952 according to Wikipedia, so it must've been known to them. Additionally, the archives produced by the Nemesis compressor in TSDC don't seem to be optimal either - for a simple uncompressed file I was able to produce a hand-compressed file which was 4 bytes smaller than TSDC's output. I know 4 bytes is a trivial difference, but it might be larger for more complex files, although then again no one would hand-compress a more complex file anyway.

    As far as I can tell the compressor operates on runs of nybbles rather than runs of words. I haven't touched the Compression Theory part, however, because it was written by Nemesis himself, and so I wanted someone with more knowledge of this to confirm before modifying it.

    Also, in case anyone's interested, here's an annotated version of the Nemesis decompression code in Sonic & Knuckles in all its syntax-highlighted glory:

    Code (ASM):
    1. ; ---------------------------------------------------------------------------
    2. ; Nemesis decompression subroutine, decompresses art directly to VRAM
    3. ; Inputs:
    4. ; a0 = art address
    5. ; a VDP command to write to the destination VRAM address must be issued
    6. ; before calling this routine
    7. ; ---------------------------------------------------------------------------
    8.  
    9. ; =============== S U B R O U T I N E =======================================
    10.  
    11.  
    12. Nem_Decomp:
    13.         movem.l d0-a1/a3-a5,-(sp)
    14.         lea (Nem_PCD_WriteRowToVDP).l,a3
    15.         lea (VDP_data_port).l,a4    ; write all rows to the VDP data port
    16.         bra.s   Nem_Decomp_Main
    17. ; End of function Nem_Decomp
    18.  
    19. ; ---------------------------------------------------------------------------
    20. ; Nemesis decompression subroutine, decompresses art to RAM
    21. ; Inputs:
    22. ; a0 = art address
    23. ; a4 = destination RAM address
    24. ; ---------------------------------------------------------------------------
    25.  
    26. ; =============== S U B R O U T I N E =======================================
    27.  
    28.  
    29. Nem_Decomp_To_RAM:
    30.         movem.l d0-a1/a3-a5,-(sp)
    31.         lea (Nem_PCD_WriteRowToRAM).l,a3
    32. ; End of function Nem_Decomp_To_RAM
    33.  
    34. ; ---------------------------------------------------------------------------
    35. ; Main Nemesis decompression subroutine
    36. ; ---------------------------------------------------------------------------
    37.  
    38. ; =============== S U B R O U T I N E =======================================
    39.  
    40.  
    41. Nem_Decomp_Main:
    42.         lea (Nem_code_table).w,a1
    43.         move.w  (a0)+,d2    ; get number of patterns
    44.         lsl.w   #1,d2
    45.         bcc.s   +   ; branch if the sign bit isn't set
    46.         adda.w  #Nem_PCD_WriteRowToVDP_XOR-Nem_PCD_WriteRowToVDP,a3 ; otherwise the file uses XOR mode
    47. +
    48.         lsl.w   #2,d2   ; get number of 8-pixel rows in the uncompressed data
    49.         movea.w d2,a5   ; and store it in a5 because there aren't any spare data registers
    50.         moveq   #8,d3   ; 8 pixels in a pattern row
    51.         moveq   #0,d2
    52.         moveq   #0,d4
    53.         bsr.w   Nem_Build_Code_Table
    54.         move.b  (a0)+,d5    ; get first byte of compressed data
    55.         asl.w   #8,d5   ; shift up by a byte
    56.         move.b  (a0)+,d5    ; get second byte of compressed data
    57.         move.w  #$10,d6 ; set initial shift value
    58.         bsr.s   Nem_Process_Compressed_Data
    59.         movem.l (sp)+,d0-a1/a3-a5
    60.         rts
    61. ; End of function Nem_Decomp_Main
    62.  
    63. ; ---------------------------------------------------------------------------
    64. ; Part of the Nemesis decompressor, processes the actual compressed data
    65. ; ---------------------------------------------------------------------------
    66.  
    67. ; =============== S U B R O U T I N E =======================================
    68.  
    69. ; PCD is used throughout this subroutine as an initialism for Process_Compressed_Data
    70. Nem_Process_Compressed_Data:
    71.         move.w  d6,d7
    72.         subq.w  #8,d7   ; get shift value
    73.         move.w  d5,d1
    74.         lsr.w   d7,d1   ; shift so that high bit of the code is in bit position 7
    75.         cmpi.b  #%11111100,d1   ; are the high 6 bits set?
    76.         bcc.s   Nem_PCD_InlineData  ; if they are, it signifies inline data
    77.         andi.w  #$FF,d1
    78.         add.w   d1,d1
    79.         move.b  (a1,d1.w),d0    ; get the length of the code in bits
    80.         ext.w   d0
    81.         sub.w   d0,d6   ; subtract from shift value so that the next code is read next time around
    82.         cmpi.w  #9,d6   ; does a new byte need to be read?
    83.         bcc.s   +   ; if not, branch
    84.         addq.w  #8,d6
    85.         asl.w   #8,d5
    86.         move.b  (a0)+,d5    ; read next byte
    87. +
    88.         move.b  1(a1,d1.w),d1
    89.         move.w  d1,d0
    90.         andi.w  #$F,d1  ; get palette index for pixel
    91.         andi.w  #$F0,d0
    92.  
    93. Nem_PCD_GetRepeatCount:
    94.         lsr.w   #4,d0   ; get repeat count
    95.  
    96. Nem_PCD_WritePixel:
    97.         lsl.l   #4,d4   ; shift up by a nybble
    98.         or.b    d1,d4   ; write pixel
    99.         subq.w  #1,d3   ; has an entire 8-pixel row been written?
    100.         bne.s   Nem_PCD_WritePixel_Loop ; if not, loop
    101.         jmp (a3)    ; otherwise, write the row to its destination
    102. ; ---------------------------------------------------------------------------
    103.  
    104. Nem_PCD_NewRow:
    105.         moveq   #0,d4   ; reset row
    106.         moveq   #8,d3   ; reset nybble counter
    107.  
    108. Nem_PCD_WritePixel_Loop:
    109.         dbf d0,Nem_PCD_WritePixel
    110.         bra.s   Nem_Process_Compressed_Data
    111. ; ---------------------------------------------------------------------------
    112.  
    113. Nem_PCD_InlineData:
    114.         subq.w  #6,d6   ; 6 bits needed to signal inline data
    115.         cmpi.w  #9,d6
    116.         bcc.s   +
    117.         addq.w  #8,d6
    118.         asl.w   #8,d5
    119.         move.b  (a0)+,d5
    120. +
    121.         subq.w  #7,d6   ; and 7 bits needed for the inline data itself
    122.         move.w  d5,d1
    123.         lsr.w   d6,d1   ; shift so that low bit of the code is in bit position 0
    124.         move.w  d1,d0
    125.         andi.w  #$F,d1  ; get palette index for pixel
    126.         andi.w  #$70,d0 ; high nybble is repeat count for pixel
    127.         cmpi.w  #9,d6
    128.         bcc.s   Nem_PCD_GetRepeatCount
    129.         addq.w  #8,d6
    130.         asl.w   #8,d5
    131.         move.b  (a0)+,d5
    132.         bra.s   Nem_PCD_GetRepeatCount
    133. ; ---------------------------------------------------------------------------
    134.  
    135. Nem_PCD_WriteRowToVDP:
    136.         move.l  d4,(a4) ; write 8-pixel row
    137.         subq.w  #1,a5
    138.         move.w  a5,d4   ; have all the 8-pixel rows been written?
    139.         bne.s   Nem_PCD_NewRow  ; if not, branch
    140.         rts ; otherwise the decompression is finished
    141. ; ---------------------------------------------------------------------------
    142.  
    143. Nem_PCD_WriteRowToVDP_XOR:
    144.         eor.l   d4,d2   ; XOR the previous row by the current row
    145.         move.l  d2,(a4) ; and write the result
    146.         subq.w  #1,a5
    147.         move.w  a5,d4
    148.         bne.s   Nem_PCD_NewRow
    149.         rts
    150. ; ---------------------------------------------------------------------------
    151.  
    152. Nem_PCD_WriteRowToRAM:
    153.         move.l  d4,(a4)+
    154.         subq.w  #1,a5
    155.         move.w  a5,d4
    156.         bne.s   Nem_PCD_NewRow
    157.         rts
    158. ; ---------------------------------------------------------------------------
    159.  
    160. Nem_PCD_WriteRowToRAM_XOR:
    161.         eor.l   d4,d2
    162.         move.l  d2,(a4)+
    163.         subq.w  #1,a5
    164.         move.w  a5,d4
    165.         bne.s   Nem_PCD_NewRow
    166.         rts
    167. ; End of function Nem_Process_Compressed_Data
    168.  
    169. ; ---------------------------------------------------------------------------
    170. ; Part of the Nemesis decompressor, builds the code table (in RAM)
    171. ; ---------------------------------------------------------------------------
    172.  
    173. ; =============== S U B R O U T I N E =======================================
    174.  
    175. ; BCT is used throughout this subroutine as an initialism for Build_Code_Table
    176. Nem_Build_Code_Table:
    177.         move.b  (a0)+,d0    ; read first byte
    178.  
    179. Nem_BCT_ChkEnd:
    180.         cmpi.b  #$FF,d0 ; has the end of the code table description been reached?
    181.         bne.s   Nem_BCT_NewPalIndex ; if not, branch
    182.         rts ; otherwise, this subroutine's work is done
    183. ; ---------------------------------------------------------------------------
    184.  
    185. Nem_BCT_NewPalIndex:
    186.         move.w  d0,d7
    187.  
    188. Nem_BCT_Loop:
    189.         move.b  (a0)+,d0    ; read next byte
    190.         cmpi.b  #$80,d0 ; sign bit being set signifies a new palette index
    191.         bcc.s   Nem_BCT_ChkEnd  ; a bmi could have been used instead of a compare and bcc
    192.         move.b  d0,d1
    193.         andi.w  #$F,d7  ; get palette index
    194.         andi.w  #$70,d1 ; get repeat count for palette index
    195.         or.w    d1,d7   ; combine the two
    196.         andi.w  #$F,d0  ; get the length of the code in bits
    197.         move.b  d0,d1
    198.         lsl.w   #8,d1
    199.         or.w    d1,d7   ; combine with palette index and repeat count to form code table entry
    200.         moveq   #8,d1
    201.         sub.w   d0,d1   ; is the code 8 bits long?
    202.         bne.s   Nem_BCT_ShortCode   ; if not, a bit of extra processing is needed
    203.         move.b  (a0)+,d0    ; get code
    204.         add.w   d0,d0   ; each code gets a word-sized entry in the table
    205.         move.w  d7,(a1,d0.w)    ; store the entry for the code
    206.         bra.s   Nem_BCT_Loop    ; repeat
    207. ; ---------------------------------------------------------------------------
    208.  
    209. ; the Nemesis decompressor uses prefix-free codes (no valid code is a prefix of a longer code)
    210. ; e.g. if 10 is a valid 2-bit code, 110 is a valid 3-bit code but 100 isn't
    211. ; also, when the actual compressed data is processed the high bit of each code is in bit position 7
    212. ; so the code needs to be bit-shifted appropriately over here before being used as a code table index
    213. ; additionally, the code needs multiple entries in the table because no masking is done during compressed data processing
    214. ; so if 11000 is a valid code then all indices of the form 11000XXX need to have the same entry
    215. Nem_BCT_ShortCode:
    216.         move.b  (a0)+,d0    ; get code
    217.         lsl.w   d1,d0   ; shift so that high bit is in bit position 7
    218.         add.w   d0,d0   ; get index into code table
    219.         moveq   #1,d5
    220.         lsl.w   d1,d5
    221.         subq.w  #1,d5   ; d5 = 2^d1 - 1
    222.  
    223. Nem_BCT_ShortCode_Loop:
    224.         move.w  d7,(a1,d0.w)    ; store entry
    225.         addq.w  #2,d0   ; increment index
    226.         dbf d5,Nem_BCT_ShortCode_Loop   ; repeat for required number of entries
    227.         bra.s   Nem_BCT_Loop
    228. ; End of function Nem_Build_Code_Table
     
  2. Chilly Willy

    Chilly Willy

    Just pining for the fjords Tech Member
    755
    21
    18
    Heart of NASCAR
    Doom CD32X Fusion
    Huffman was avoided by lesser processors in games because Huffman is SLOOOOOOOOOOOOOOOOOOW to decode. On the 68020, decoding the Huffman portion of mpeg video or audio was often up to 60% of your CPU time. The PS2 included special hardware to extract random size bitfields from bitstreams to help accelerate Huffman decoding.

    A game would only use Huffman if it were loaded at the start of a level all at once and never done again during the level. Anything that needs to be decompressed on the fly during gameplay was done in a format that decoded fast, even if it meant less compression.
     
  3. Ah okay, makes sense. Sonic 3 & Knuckles actually uses Nemesis like that for the most part - most of the art which needs to be decompressed during a level is stored as KosM.

    Also, why would Nemesis be used for the 16x16s in Sonic CD? Enigma makes a lot more sense for those since it was specifically made to deal with plane/block mappings.
     
  4. Chilly Willy

    Chilly Willy

    Just pining for the fjords Tech Member
    755
    21
    18
    Heart of NASCAR
    Doom CD32X Fusion
    Two different teams were working on Sonic CD and Sonic 2 at the time, so maybe there was a difference of opinion as to compression strength vs decompression speed between the two teams. You might think that with a CD, the Sonic CD team would be less concerned about compression, but it's just the opposite - the CD only gives you about 480 KB RAM - the program size for data, while a cart gives you as much space as you're willing to spend on larger roms. The CD also gives you about 50% more CPU power for decompressing things. So I'd expect the CD team to use the better compressor/slower decompressor on data.
     
  5. I understand the part about the greater size restrictions on the Sega CD, but what I meant to say was that Enigma was specifically made to compress plane mappings (of which 16x16 mappings are a variant) and so should theoretically compress those better than Nemesis. In fact, I just tested it out with the 16x16 block mappings for all six Sonic 1 levels and got the following results (using TSDC v2.2 RC3):

    Code (Text):
    1. Uncompressed:                     19632 bytes
    2. Enigma (recompressed using KENS): 12686 bytes
    3. Enigma (from original game):      13504 bytes
    4. Nemesis:                          14230 bytes
    5. Kosinski:                         14830 bytes
    I can't imagine the results of a similar experiment for Sonic CD varying too wildly.
     
  6. Chilly Willy

    Chilly Willy

    Just pining for the fjords Tech Member
    755
    21
    18
    Heart of NASCAR
    Doom CD32X Fusion
    It's quite possible that whoever was in charge of the compression for the game was just an idiot. :specialed:

    I haven't looked into it, but maybe there was a patent issue at work at the time that mandated one of the other, even though it wasn't as good.