I started looking at the Nemesis decompression subroutine in Sonic & Knuckles a few days back, and the results can be seen at http://segaretro.org/Nemesis_compression under the Format Description and Decompression code headings. (I don't know if the decompression code is appropriate for that article actually - it seems useful for reference purposes but the lack of asm tags on Sega Retro spoils it somewhat). Anyway, I had a couple of questions which I was hoping somebody here could answer. Bear in mind that I've never formally studied information theory or compression theory in my life, so they might be quite dumb. Any idea why Sega would use Shannon-Fano? Huffman coding's been around since 1952 according to Wikipedia, so it must've been known to them. Additionally, the archives produced by the Nemesis compressor in TSDC don't seem to be optimal either - for a simple uncompressed file I was able to produce a hand-compressed file which was 4 bytes smaller than TSDC's output. I know 4 bytes is a trivial difference, but it might be larger for more complex files, although then again no one would hand-compress a more complex file anyway. As far as I can tell the compressor operates on runs of nybbles rather than runs of words. I haven't touched the Compression Theory part, however, because it was written by Nemesis himself, and so I wanted someone with more knowledge of this to confirm before modifying it. Also, in case anyone's interested, here's an annotated version of the Nemesis decompression code in Sonic & Knuckles in all its syntax-highlighted glory: Code (ASM): ; --------------------------------------------------------------------------- ; Nemesis decompression subroutine, decompresses art directly to VRAM ; Inputs: ; a0 = art address ; a VDP command to write to the destination VRAM address must be issued ; before calling this routine ; --------------------------------------------------------------------------- ; =============== S U B R O U T I N E ======================================= Nem_Decomp: movem.l d0-a1/a3-a5,-(sp) lea (Nem_PCD_WriteRowToVDP).l,a3 lea (VDP_data_port).l,a4 ; write all rows to the VDP data port bra.s Nem_Decomp_Main ; End of function Nem_Decomp ; --------------------------------------------------------------------------- ; Nemesis decompression subroutine, decompresses art to RAM ; Inputs: ; a0 = art address ; a4 = destination RAM address ; --------------------------------------------------------------------------- ; =============== S U B R O U T I N E ======================================= Nem_Decomp_To_RAM: movem.l d0-a1/a3-a5,-(sp) lea (Nem_PCD_WriteRowToRAM).l,a3 ; End of function Nem_Decomp_To_RAM ; --------------------------------------------------------------------------- ; Main Nemesis decompression subroutine ; --------------------------------------------------------------------------- ; =============== S U B R O U T I N E ======================================= Nem_Decomp_Main: lea (Nem_code_table).w,a1 move.w (a0)+,d2 ; get number of patterns lsl.w #1,d2 bcc.s + ; branch if the sign bit isn't set adda.w #Nem_PCD_WriteRowToVDP_XOR-Nem_PCD_WriteRowToVDP,a3 ; otherwise the file uses XOR mode + lsl.w #2,d2 ; get number of 8-pixel rows in the uncompressed data movea.w d2,a5 ; and store it in a5 because there aren't any spare data registers moveq #8,d3 ; 8 pixels in a pattern row moveq #0,d2 moveq #0,d4 bsr.w Nem_Build_Code_Table move.b (a0)+,d5 ; get first byte of compressed data asl.w #8,d5 ; shift up by a byte move.b (a0)+,d5 ; get second byte of compressed data move.w #$10,d6 ; set initial shift value bsr.s Nem_Process_Compressed_Data movem.l (sp)+,d0-a1/a3-a5 rts ; End of function Nem_Decomp_Main ; --------------------------------------------------------------------------- ; Part of the Nemesis decompressor, processes the actual compressed data ; --------------------------------------------------------------------------- ; =============== S U B R O U T I N E ======================================= ; PCD is used throughout this subroutine as an initialism for Process_Compressed_Data Nem_Process_Compressed_Data: move.w d6,d7 subq.w #8,d7 ; get shift value move.w d5,d1 lsr.w d7,d1 ; shift so that high bit of the code is in bit position 7 cmpi.b #%11111100,d1 ; are the high 6 bits set? bcc.s Nem_PCD_InlineData ; if they are, it signifies inline data andi.w #$FF,d1 add.w d1,d1 move.b (a1,d1.w),d0 ; get the length of the code in bits ext.w d0 sub.w d0,d6 ; subtract from shift value so that the next code is read next time around cmpi.w #9,d6 ; does a new byte need to be read? bcc.s + ; if not, branch addq.w #8,d6 asl.w #8,d5 move.b (a0)+,d5 ; read next byte + move.b 1(a1,d1.w),d1 move.w d1,d0 andi.w #$F,d1 ; get palette index for pixel andi.w #$F0,d0 Nem_PCD_GetRepeatCount: lsr.w #4,d0 ; get repeat count Nem_PCD_WritePixel: lsl.l #4,d4 ; shift up by a nybble or.b d1,d4 ; write pixel subq.w #1,d3 ; has an entire 8-pixel row been written? bne.s Nem_PCD_WritePixel_Loop ; if not, loop jmp (a3) ; otherwise, write the row to its destination ; --------------------------------------------------------------------------- Nem_PCD_NewRow: moveq #0,d4 ; reset row moveq #8,d3 ; reset nybble counter Nem_PCD_WritePixel_Loop: dbf d0,Nem_PCD_WritePixel bra.s Nem_Process_Compressed_Data ; --------------------------------------------------------------------------- Nem_PCD_InlineData: subq.w #6,d6 ; 6 bits needed to signal inline data cmpi.w #9,d6 bcc.s + addq.w #8,d6 asl.w #8,d5 move.b (a0)+,d5 + subq.w #7,d6 ; and 7 bits needed for the inline data itself move.w d5,d1 lsr.w d6,d1 ; shift so that low bit of the code is in bit position 0 move.w d1,d0 andi.w #$F,d1 ; get palette index for pixel andi.w #$70,d0 ; high nybble is repeat count for pixel cmpi.w #9,d6 bcc.s Nem_PCD_GetRepeatCount addq.w #8,d6 asl.w #8,d5 move.b (a0)+,d5 bra.s Nem_PCD_GetRepeatCount ; --------------------------------------------------------------------------- Nem_PCD_WriteRowToVDP: move.l d4,(a4) ; write 8-pixel row subq.w #1,a5 move.w a5,d4 ; have all the 8-pixel rows been written? bne.s Nem_PCD_NewRow ; if not, branch rts ; otherwise the decompression is finished ; --------------------------------------------------------------------------- Nem_PCD_WriteRowToVDP_XOR: eor.l d4,d2 ; XOR the previous row by the current row move.l d2,(a4) ; and write the result subq.w #1,a5 move.w a5,d4 bne.s Nem_PCD_NewRow rts ; --------------------------------------------------------------------------- Nem_PCD_WriteRowToRAM: move.l d4,(a4)+ subq.w #1,a5 move.w a5,d4 bne.s Nem_PCD_NewRow rts ; --------------------------------------------------------------------------- Nem_PCD_WriteRowToRAM_XOR: eor.l d4,d2 move.l d2,(a4)+ subq.w #1,a5 move.w a5,d4 bne.s Nem_PCD_NewRow rts ; End of function Nem_Process_Compressed_Data ; --------------------------------------------------------------------------- ; Part of the Nemesis decompressor, builds the code table (in RAM) ; --------------------------------------------------------------------------- ; =============== S U B R O U T I N E ======================================= ; BCT is used throughout this subroutine as an initialism for Build_Code_Table Nem_Build_Code_Table: move.b (a0)+,d0 ; read first byte Nem_BCT_ChkEnd: cmpi.b #$FF,d0 ; has the end of the code table description been reached? bne.s Nem_BCT_NewPalIndex ; if not, branch rts ; otherwise, this subroutine's work is done ; --------------------------------------------------------------------------- Nem_BCT_NewPalIndex: move.w d0,d7 Nem_BCT_Loop: move.b (a0)+,d0 ; read next byte cmpi.b #$80,d0 ; sign bit being set signifies a new palette index bcc.s Nem_BCT_ChkEnd ; a bmi could have been used instead of a compare and bcc move.b d0,d1 andi.w #$F,d7 ; get palette index andi.w #$70,d1 ; get repeat count for palette index or.w d1,d7 ; combine the two andi.w #$F,d0 ; get the length of the code in bits move.b d0,d1 lsl.w #8,d1 or.w d1,d7 ; combine with palette index and repeat count to form code table entry moveq #8,d1 sub.w d0,d1 ; is the code 8 bits long? bne.s Nem_BCT_ShortCode ; if not, a bit of extra processing is needed move.b (a0)+,d0 ; get code add.w d0,d0 ; each code gets a word-sized entry in the table move.w d7,(a1,d0.w) ; store the entry for the code bra.s Nem_BCT_Loop ; repeat ; --------------------------------------------------------------------------- ; the Nemesis decompressor uses prefix-free codes (no valid code is a prefix of a longer code) ; e.g. if 10 is a valid 2-bit code, 110 is a valid 3-bit code but 100 isn't ; also, when the actual compressed data is processed the high bit of each code is in bit position 7 ; so the code needs to be bit-shifted appropriately over here before being used as a code table index ; additionally, the code needs multiple entries in the table because no masking is done during compressed data processing ; so if 11000 is a valid code then all indices of the form 11000XXX need to have the same entry Nem_BCT_ShortCode: move.b (a0)+,d0 ; get code lsl.w d1,d0 ; shift so that high bit is in bit position 7 add.w d0,d0 ; get index into code table moveq #1,d5 lsl.w d1,d5 subq.w #1,d5 ; d5 = 2^d1 - 1 Nem_BCT_ShortCode_Loop: move.w d7,(a1,d0.w) ; store entry addq.w #2,d0 ; increment index dbf d5,Nem_BCT_ShortCode_Loop ; repeat for required number of entries bra.s Nem_BCT_Loop ; End of function Nem_Build_Code_Table
Huffman was avoided by lesser processors in games because Huffman is SLOOOOOOOOOOOOOOOOOOW to decode. On the 68020, decoding the Huffman portion of mpeg video or audio was often up to 60% of your CPU time. The PS2 included special hardware to extract random size bitfields from bitstreams to help accelerate Huffman decoding. A game would only use Huffman if it were loaded at the start of a level all at once and never done again during the level. Anything that needs to be decompressed on the fly during gameplay was done in a format that decoded fast, even if it meant less compression.
Ah okay, makes sense. Sonic 3 & Knuckles actually uses Nemesis like that for the most part - most of the art which needs to be decompressed during a level is stored as KosM. Also, why would Nemesis be used for the 16x16s in Sonic CD? Enigma makes a lot more sense for those since it was specifically made to deal with plane/block mappings.
Two different teams were working on Sonic CD and Sonic 2 at the time, so maybe there was a difference of opinion as to compression strength vs decompression speed between the two teams. You might think that with a CD, the Sonic CD team would be less concerned about compression, but it's just the opposite - the CD only gives you about 480 KB RAM - the program size for data, while a cart gives you as much space as you're willing to spend on larger roms. The CD also gives you about 50% more CPU power for decompressing things. So I'd expect the CD team to use the better compressor/slower decompressor on data.
I understand the part about the greater size restrictions on the Sega CD, but what I meant to say was that Enigma was specifically made to compress plane mappings (of which 16x16 mappings are a variant) and so should theoretically compress those better than Nemesis. In fact, I just tested it out with the 16x16 block mappings for all six Sonic 1 levels and got the following results (using TSDC v2.2 RC3): Code (Text): Uncompressed: 19632 bytes Enigma (recompressed using KENS): 12686 bytes Enigma (from original game): 13504 bytes Nemesis: 14230 bytes Kosinski: 14830 bytes I can't imagine the results of a similar experiment for Sonic CD varying too wildly.
It's quite possible that whoever was in charge of the compression for the game was just an idiot. :specialed: I haven't looked into it, but maybe there was a patent issue at work at the time that mandated one of the other, even though it wasn't as good.