don't click here

Utility clownnemesis - An Accurate Nemesis Compressor

Discussion in 'Engineering & Reverse Engineering' started by Clownacy, Feb 12, 2024.

  1. Clownacy

    Clownacy

    Tech Member
    1,092
    664
    93
    Here is a Nemesis compressor and decompressor that I made.

    Notably, this compresses data identically to Sega's original Nemesis compressor. This is useful for disassemblies, as it allows assets to be stored uncompressed for ease of viewing and editing without causing the built ROM to become inaccurate due to differences in compression.

    The compressor also has an option to compress data better than Sega's compressor, though the compressed data will no longer be accurate to it. Currently this improved compression makes compressed data 1% smaller.

    clownnemesis is written in ANSI C and its licence is 0BSD.

    Download: https://github.com/Clownacy/clownnemesis/releases

    Combined with my accurate Kosinski compressor and the rediscovered original Saxman compressor, 3 out of the 4 main Sonic compression formats now have accurate compressors!

    Making a Nemesis Compressor
    [Another cross-post from the blog. I figured that it is relevant here, being about creating a compressor for a compression format used by the Sonic games.]

    Long ago, I made a compressor for Kosinski, which is a compression format seemingly created by Sega and used in various Mega Drive games, including Sonic the Hedgehog. There are two other compression format used by those games as well: Enigma and Nemesis. Having recently become unhappy with the portability and code quality of Flamewing's mdcomp compression library, I decided that I would make my own Nemesis compressor. If only I knew what I was getting myself into...

    I began by reading the trusty Sega Retro page on the format, which provided a detailed specification of the format and an overview of the algorithms used to encode data in it. With that information alone, I was able to produce a working decompressor.

    It was interesting to see just how simple Nemesis was: it is a nybble-based run-length encoding scheme with each run placed into a kind of dictionary that substitutes it with a variable length code. The most common runs get the shortest codes, and the rarest runs get the longest codes. In effect, the data has two layers of compression applied to it. The process of substituting data for variable length codes is known as 'Huffman encoding', though this is a misnomer as I will explain later.

    With the decompressor done, it was time to move onto making the compressor. I had heard from Vladikcomper that the Nemesis data in the Sonic games was encoded with "Shannon coding", so I figured that I would begin with making a compressor which used that.

    Upon looking into it, I learned that Shannon coding was the first of its kind, dating all the way back to the 1940s. It derived the variable-length codes from the frequency of the symbol (nybble-run) within the data. It was pretty easy to implement, but soon it became obvious that this was not the algorithm that Sega used in their Nemesis compressor: the lengths of the variable-length codes differed unpredictably, and, rather than be directly derived from the symbol's frequency, the codes were clearly generated in a structured pattern. Undeterred, I continued working on my compressor using Shannon coding. In the end, I had it working and successfully producing valid Nemesis-encoded data. Unfortunately, the use of Shannon coding had proven to be especially unwise, as data produced by the compressor was considerably larger than the data produced by Sega's compressor.

    Whilst reading about Shannon coding, I learned of a similar algorithm known as Fano coding, which is frequently confused with Shannon coding. This would explain why Vladikcomper claimed that Shannon coding was used by Sega's Nemesis compressor. So, I began modifying my compressor to use Fano coding instead.

    Fano coding works completely differently to Shannon coding, producing its variable-length codes by repeatedly performing a binary split on a list of the symbols sorted by their frequency within the data. A single recursive function was all that was needed to implement this coding scheme. This matched the output of Sega's compressor exactly (though unfortunately some sorting quirks keep it from being binary-exact), meaning that my compressor was able to match its compression ratio, being neither more efficient nor less efficient!

    While it was nice to have a Nemesis compressor with a decent compression ratio, I wanted it to be able to compress data better than Sega's compressor. After all, that is what I did with my Kosinski compressor all those years ago. For this, I knew that there was only one solution: Huffman coding.

    Huffman coding is the perfection of what Shannon coding pioneered: while Shannon and Fano generate codes that are 'good', they are not perfect; in contrast, Huffman always produces codes that are optimal. The trade-off is that Huffman is a lot more complex to implement, requiring the construction and parsing of an entire binary tree using priority queues. While trivial to do with C++'s standard library, my compressor uses ANSI C (C89), so I had to get very creative in order to implement this as easily, but also efficiently, as possible. In the end, I was able to implement this using a single statically-allocated array. However, something was wrong: the generated data was larger than the output of Fano coding. How could Huffman, an optimal code generator, be less efficient than Fano?

    The first shortcoming that I noticed was with a quirk of the Nemesis format: there is a specific code that is reserved - 111111. Variable length codes are not allowed to use this code nor a prefix of it (1, 11, 111, etc.). With Fano coding, codes that conflicted with the reserved code were rare, whereas with Shannon and Huffman they were common. Conflicting codes had to either be rejected (causing all instances of the symbol they represent to instead be inlined into the bit stream) or moved to another, longer, code.

    The second shortcoming was that codes had a limit to the length that they could be: Nemesis only supports up to a length of 8, while Huffman coding was generating its optimal codes on the assumption that they could be any length. The fact that these overly-long codes had to be rejected meant that the output of Huffman coding was no longer optimal! Granted, Shannon coding and Fano coding are susceptible to this as well, but somehow it was just worse with Huffman. However, Huffman has a solution to this: there exists a variant of Huffman coding that is specifically intended for codes with a limited maximum length, producing codes that are optimal within this constraint.

    Performing Huffman coding in this manner requires what is known as the package-merge algorithm, which can be implemented as a slightly modified version of the usual Huffman algorithm with an extra queue. Rather than produce a whole tree, this algorithm produces a series of them, and, rather than produce the codes directly, parsing these trees produces the codes' lengths. A simple algorithm can be used to generate 'canonical Huffman' codes from these lengths.

    Unbelievably, despite implementing this algorithm, the generated data was still too large. I was able to find one file that compressed slightly better with my compressor than Sega's, but the vast majority compressed to a considerably larger size. Examining the data, I was able to discern a pattern: while Huffman coding allowed many more symbols to be assigned a code, the most common symbols would be assigned longer codes than they would be with Fano coding. The space saved by assigning codes to more symbols was counteracted by the most common symbols having larger codes. This was unavoidable, as Huffman is a 'greedy' algorithm, the same kind of algorithm that keeps standard LZSS compressors from being optimal: Huffman is only ideal when you do not consider that encoding fewer symbols may be for the better.

    I considered hackishly adding a heuristic, but could not stand the thought of it. Then I considered brute-forcing: I could repeat the tree-generation process over and over again, using fewer and fewer symbols until there was only one left, all the while I would be keeping track of which iteration would require the fewest bits to output. This proved to work incredibly well in practice: the brute-forcing does not scale with the size of the input data, but rather the number of symbols and the maximum code length, keeping the performance impact low; but, more importantly, this allowed Huffman to finally outdo Fano! It is not a huge amount: just a dozen bytes or so for small files or a few hundred bytes for larger ones, but that is about on-par with how my perfect Kosinski compressor compares to standard Kosinski compressors!

    It has been a wild ride, but I finally have a Nemesis compressor that outperforms the one used by Sega. It still produces larger data than Flamewing's compressor, but that is to be expected considering the crazy black magic which that thing does. I am glad to finally see the completion of this project: I have been working on it non-stop for three days. So many headaches and so many hours spent comprehending obscure, poorly-documented algorithms. If only I was good at reading academic papers.

    With this and clownlzss, I hope to replace all of Flamewing's compressors in ClownMapEd. Then I can stop getting a bunch of compiler warnings about the compressors treating the return value of 'std::istream::tellg()' as an integer. Ew.

    The source code to my Nemesis compressor and decompressor can be found here: https://github.com/Clownacy/clownnemesis
     
    Last edited: Feb 18, 2024
    • Informative Informative x 10
    • Like Like x 3
    • Useful Useful x 1
    • List
  2. That was a great read! I only have a surface-level understanding of compression algorithms, but the post was written at a level where I was able to follow along without issue.
     
  3. BenoitRen

    BenoitRen

    Tech Member
    753
    364
    63
    Now I'm wondering what this black magic is that flamewing's compressor does.
     
    • Like Like x 1
    • Agree Agree x 1
    • List
  4. Overlord

    Overlord

    Now playable in Smash Bros Ultimate Moderator
    19,382
    1,036
    93
    Long-term happiness
    I always find your posts fascinating, nice job on the writeup.
     
    • Agree Agree x 5
    • Like Like x 1
    • List
  5. RealMalachi

    RealMalachi

    you can call me mal Member
    Nice work! Nemesis actually has some good use for CD and Mode 1 games due to the BIOS' (well at least the ones that matter) having a branch to it and Enigma in a common area, so this could be quite useful for general CD homebrew

    Since it's partially related, are there any decompression changes you can think of that would improve compression size or decompression speed? Kinda like a "NemesisPlus" so to speak
     
  6. Clownacy

    Clownacy

    Tech Member
    1,092
    664
    93
    So far, my biggest complaint about Nemesis is that reserved code, 111111. Its presence requires that perfectly-good Huffman trees be meddled-with in order to prevent conflicts. I imagine that a fair number of codes are rendered unusable because of this, as skipping such an arbitrary code does not fit well with how canonical Huffman codes are generated (they follow a very specific pattern which skipping disrupts): rather than skipping just one canonical Huffman code, it seems to skip multiple, causing fewer codes to be generated in total before reaching the 8-bit limit. I think that a solution to this would have been to make the reserved code configurable, able to vary from compressed data to compressed data. That way, when generating the Huffman tree, only one code would have to be sacrificed for it rather than an entire range of them.

    Another flaw of the format is that the code table (the section at the start of the compressed data which specifies each code that will appear in the following bitstream) specifies both the code's length and the code itself. As I described in my original post, it's actually possible to generate codes using just the lengths; storing the code in the compressed data is unnecessary, and therefore a complete waste of space. This mistake actually limits the maximum length of a nybble-run to 8 bits, rather than 16, as the highest bit of the nybble that stores the length of the run minus one is used to store a flag that is related to the code. If there were no code, and only a code length, then that flag would not be necessary, and the entire nybble could be used for storing the length of the run.

    A much more wild improvement that could be made to Nemesis is to combine it with Kosinski, but this would not be suitable for the Mega Drive. Despite what Sega Retro's Nemesis page says, Kosinski often compresses data better than Nemesis. However, combining it with Nemesis could yield even better compression. As explained earlier, Nemesis is actually two compression schemes layered on top of each other: run-length coding at the bottom, and so-called "Huffman coding" on top. To combine Kosinski with Nemesis, the run-length encoding would be replaced with Kosinski. Then, instead of representing a nybble-run, each code would represent an LZSS match. Combining LZSS with "Huffman coding" is actually what ZIP files do, in their DEFLATE format. The complication of this is that there are many more types of LZSS match than there are types of nybble run, making the 8-bit maximum length of codes far more limiting. To resolve this, the limit may have to be raised to 16-bit, which complicates the process of matching codes to their corresponding data as a plain and fast lookup table can no longer be used (such a table would be over 64KiB - the entire size of the Mega Drive's RAM). Alternatively, the variety of LZSS matches may have to be arbitrarily limited to a number that 8-bit codes can encode, which would harm the compression ratio.

    Nemesis's slow speed comes from the fact that it relies so heavily on processing a stream of bits rather than bytes, but using a bitstream is so fundamental to Nemesis that it cannot be avoided. At the same time, Nemesis's speed and compression ratio are eclipsed by Kosinski. While Nemesis is mainly used for streaming tile data to VRAM over the course of several frames, a Kosinski decompressor could be made to do the same thing. In a roundabout way, the best way to 'improve Nemesis' is to just use Kosinski instead.
     
    Last edited: Feb 13, 2024
  7. MainMemory

    MainMemory

    Kate the Wolf Tech Member
    4,783
    363
    63
    SonLVL
    Isn't that literally what Moduled Kosinski is?
     
  8. Clownacy

    Clownacy

    Tech Member
    1,092
    664
    93
    Yes, but even a decompressor for vanilla Kosinski could be made to write directly to VRAM, albeit with a 0x2000-byte dictionary buffer in RAM as opposed to Moduled Kosinski's 0x1000-byte output buffer.
     
  9. Brainulator

    Brainulator

    Regular garden-variety member Member
    What are these sorting quirks you speak of? If it's with the dictionary, I've only ever seen things being sorted numerically.
     
  10. Clownacy

    Clownacy

    Tech Member
    1,092
    664
    93
    Sega's compressor had a quirky way of sorting its list of nybble runs before performing the binary split on it: after sorting the list from most common to least common, it would then shuffle some of the nybble runs around. The effect of this is that the common nybble runs were more likely to get shorter codes. I'm still trying to figure out how exactly this shuffling was done, because nybble runs with the exact same frequency have unpredictable ordering.
     
    Last edited: Feb 14, 2024
    • Like Like x 1
    • Informative Informative x 1
    • List
  11. Brainulator

    Brainulator

    Regular garden-variety member Member
    May I see an example of this?
     
  12. Nemesis

    Nemesis

    Tech Member
    I ran into some of the same things doing my original Nemesis compressor. Shannon-fano coding is the proper name for the approach sega used, and I also managed to binary exact match the output of the original sega compression. I also beat it, but I didn't use any kind of abstract coding methods. The thing the approach sega used doesn't take into account is the actual effect on binary length both promoting one pattern into a shorter form, and demoting other pattern(s) into a longer form has. My compression routine just asked the question "would the binary actually be smaller if I make this change?" before committing to it. That takes a lot more computation, but it's lightning fast on modern hardware, where it would have been quite a performance penalty on the hardware sega developed on back in the day.

    Pretty sure the original code for my compressor/decompressor is up somewhere? No idea how it compares to what flamewing did, and I wrote it a long, long time ago, so no idea how well it was actually written.

    Couldn't find the source code on Sonic retro wiki. Guess it disappeared to time. I'll look in my archives, I have all the source here.
     
  13. Nemesis

    Nemesis

    Tech Member
    • Like Like x 9
    • Informative Informative x 1
    • List
  14. Selbi

    Selbi

    The Euphonic Mess Member
    1,508
    76
    28
    Northern Germany
    Sonic ERaZor
    "Oh, and one other note. The rom breakdown might take awhile to load if you're on dialup :D (no joke, it's over 100Kb)"

    I'm afraid, you're correct. Huge thanks for sharing this though! This is an amazing time capsule and I'm glad it wasn't lost to time :D
     
    • Like Like x 2
    • Agree Agree x 1
    • List
  15. Nemesis

    Nemesis

    Tech Member
  16. Brainulator

    Brainulator

    Regular garden-variety member Member
  17. Nemesis

    Nemesis

    Tech Member
    That would have been this file, renamed by someone else to that name you see on the wiki. Here's the original "MDProg" archive referenced there too:
    https://nemesis.exodusemulator.com/Historic/HackingSite/programs/mdprog.rar

    I have all the sourcecode for this stuff in my archives... somewhere. It's not exactly organised.
     
  18. Clownacy

    Clownacy

    Tech Member
    1,092
    664
    93
    Here are the code tables of a file before and after being reencoded with my compressor:

    Code (Text):
    1. Code 000      of 3 bits encodes nybble 1 of length 1 and was seen 74 times.
    2. Code 001      of 3 bits encodes nybble 0 of length 1 and was seen 40 times.
    3. Code 010      of 3 bits encodes nybble E of length 1 and was seen 40 times.
    4. Code 0110     of 4 bits encodes nybble 9 of length 1 and was seen 37 times.
    5. Code 0111     of 4 bits encodes nybble F of length 1 and was seen 34 times.
    6. Code 1000     of 4 bits encodes nybble 8 of length 1 and was seen 23 times.
    7. Code 10010    of 5 bits encodes nybble 0 of length 5 and was seen 18 times.
    8. Code 10011    of 5 bits encodes nybble 1 of length 2 and was seen 20 times.
    9. Code 1010     of 4 bits encodes nybble 0 of length 4 and was seen 21 times.
    10. Code 10110    of 5 bits encodes nybble 7 of length 1 and was seen 17 times.
    11. Code 10111    of 5 bits encodes nybble 0 of length 2 and was seen 16 times.
    12. Code 11000    of 5 bits encodes nybble 0 of length 3 and was seen 15 times.
    13. Code 11001    of 5 bits encodes nybble 0 of length 6 and was seen 13 times.
    14. Code 11010    of 5 bits encodes nybble C of length 1 and was seen 12 times.
    15. Code 110110   of 6 bits encodes nybble 6 of length 1 and was seen 11 times.
    16. Code 110111   of 6 bits encodes nybble 0 of length 8 and was seen 11 times.
    17. Code 111000   of 6 bits encodes nybble A of length 1 and was seen 9 times.
    18. Code 111001   of 6 bits encodes nybble 5 of length 1 and was seen 8 times.
    19. Code 111010   of 6 bits encodes nybble D of length 1 and was seen 6 times.
    20. Code 111011   of 6 bits encodes nybble A of length 2 and was seen 6 times.
    21. Code 111100   of 6 bits encodes nybble F of length 2 and was seen 6 times.
    22. Code 1111010  of 7 bits encodes nybble 0 of length 7 and was seen 6 times.
    23. Code 1111011  of 7 bits encodes nybble E of length 2 and was seen 5 times.
    24. Code 11111000 of 8 bits encodes nybble 3 of length 1 and was seen 4 times.
    25. Code 11111001 of 8 bits encodes nybble 4 of length 1 and was seen 4 times.
    26. Code 11111010 of 8 bits encodes nybble B of length 1 and was seen 3 times.
    Code (Text):
    1. Code 000      of 3 bits encodes nybble 1 of length 1 and was seen 74 times.
    2. Code 001      of 3 bits encodes nybble 0 of length 1 and was seen 40 times.
    3. Code 010      of 3 bits encodes nybble E of length 1 and was seen 40 times.
    4. Code 0110     of 4 bits encodes nybble 9 of length 1 and was seen 37 times.
    5. Code 0111     of 4 bits encodes nybble F of length 1 and was seen 34 times.
    6. Code 1000     of 4 bits encodes nybble 8 of length 1 and was seen 23 times.
    7. Code 10010    of 5 bits encodes nybble 1 of length 2 and was seen 20 times.
    8. Code 10011    of 5 bits encodes nybble 0 of length 5 and was seen 18 times.
    9. Code 1010     of 4 bits encodes nybble 0 of length 4 and was seen 21 times.
    10. Code 10110    of 5 bits encodes nybble 7 of length 1 and was seen 17 times.
    11. Code 10111    of 5 bits encodes nybble 0 of length 2 and was seen 16 times.
    12. Code 11000    of 5 bits encodes nybble 0 of length 3 and was seen 15 times.
    13. Code 11001    of 5 bits encodes nybble 0 of length 6 and was seen 13 times.
    14. Code 11010    of 5 bits encodes nybble C of length 1 and was seen 12 times.
    15. Code 110110   of 6 bits encodes nybble 0 of length 8 and was seen 11 times.
    16. Code 110111   of 6 bits encodes nybble 6 of length 1 and was seen 11 times.
    17. Code 111000   of 6 bits encodes nybble A of length 1 and was seen 9 times.
    18. Code 111001   of 6 bits encodes nybble 5 of length 1 and was seen 8 times.
    19. Code 111010   of 6 bits encodes nybble 0 of length 7 and was seen 6 times.
    20. Code 111011   of 6 bits encodes nybble A of length 2 and was seen 6 times.
    21. Code 111100   of 6 bits encodes nybble D of length 1 and was seen 6 times.
    22. Code 1111010  of 7 bits encodes nybble F of length 2 and was seen 6 times.
    23. Code 1111011  of 7 bits encodes nybble E of length 2 and was seen 5 times.
    24. Code 11111000 of 8 bits encodes nybble 3 of length 1 and was seen 4 times.
    25. Code 11111001 of 8 bits encodes nybble 4 of length 1 and was seen 4 times.
    26. Code 11111010 of 8 bits encodes nybble 6 of length 2 and was seen 3 times.
    While the codes themselves are the same, and their assignments to nybble runs are exactly as efficient, nybble runs of similar occurrence and the same code length end up in different orders.

    Do you still have the version that produces binary-exact files?
     
  19. Brainulator

    Brainulator

    Regular garden-variety member Member
    Hmm... I see that this is Cucky's art in Sonic 1, and there's a code for nybble B of length 1 that is absent in the recompressed version, replaced by a nybble 6 of length 2. This looks to be more than a sorting thing, though minor nonetheless.
     
  20. Nemesis

    Nemesis

    Tech Member
    I might do, there are some backed up versions of the source. And if not, pretty sure I could reconstruct it.