don't click here

The Mystery of Sonic 2's Subtly-Broken Sound Driver Compression

Discussion in 'General Sonic Discussion' started by Clownacy, Jun 7, 2023.

  1. Clownacy


    Tech Member
    This is another cross-post from my blog.

    Sonic the Hedgehog 2
    's compressed sound driver code has a strange quirk: it's followed by a single '4E' byte. What's more, this byte is counted as being part of the compressed data. Despite this, no such byte is created when compressing this code using the same compressor that Sonic Team did back in 1992.

    To compress the game's music and sound driver code, Sonic Team used Haruhiko Okumura's LZSS compressor. This very compressor's source code can still be found today, allowing people to authentically recompress the game's data using the original compressor.

    The presence of this stray '4E' byte is strange, but even stranger is its apparent lack of presence in Sonic 2's prototypes: when examining Sonic 2's 'Beta 4-8' builds, no '4E' byte can be found after the compressed data. Digging slightly deeper, however, this byte can be found in the earlier August 21st and September 14th prototypes, and yet it is once again absent in the even-earlier Simon Wai prototype. What in the world is going on here?

    It appears to have to do with the size of the compressed data: if the compressed data is an odd number of bytes long, then a '4E' byte is appended to it. This is why the finished version of Sonic 2, along with its August 21st and September 14th prototypes, have this byte: their compressed sound driver code is an odd number of bytes long.

    You might think that this byte is appended to the compressed data to ensure that it is always an even number of bytes long. After all, the Mega Drive's CPU, the Motorola 68000, is sensitive to data that begins at an odd address, so it's usually best to pad data to even addresses wherever possible.

    Unfortunately, that does not appear to be the case: in the prototypes where the compressed data is not an odd number of bytes long, it still has an extra byte appended to it. The only difference is that, in this case, the byte is '00'. This can be hard to notice as the compressed data in the ROM is followed by a region of padding that is filled with '00' bytes. However, this invisible byte can be detected by examining the 68000 instruction that holds the length of the compressed data: regardless of whether the compressed data is an odd or even number of bytes long, the instruction will always contains a length that is one byte longer than the actual compressed data, suggesting that a stray byte is always inserted after it.

    I still do not understand why this happens: why would Sonic Team's copy of Okumura's compressor accidentally emit an extra byte at the end, why would this byte be '00' or '4E' depending on the length of the compressed data, and why does it only occur in the compressed sound driver code and not the compressed music data?

    The presence of this garbage byte is not just a waste of ROM, but it also causes a slight bug during decompression: Okumura's LZSS format lacks a 'termination match', meaning that the only way to know when the end of the compressed data has been reached is to count how many bytes of it have been processed. This is why the game contains an instruction which reflects the length of the compressed data: the decompressor uses this to decide when to finish decompression. The fact that this instruction counts the garbage byte as part of the compressed data means that the decompressor ends up processing it as if it were compressed data even though it is not. This results in the decompressor emitting garbage data of its own after the end of the decompressed data, which could potentially overwrite important code or data. In Sonic 2's case, however, this garbage merely overwrites a portion of unused memory.

    In the end, this bug is a harmless curiosity - one so obscure that the only people who ever seem to notice it are the ones whom maintain the Sonic 2 disassembly's build system. In fact, the only other person on the entire internet that I've seen mention this is Xenowhirl, back in 2007.

    For reference, here's a list of Sonic 2 builds and their appended bytes:
    Code (Text):
    1. Sonic 2 Simon Wai      - even - 0
    2. Sonic 2 August 21st    - odd  - 4E
    3. Sonic 2 September 14th - odd  - 4E
    4. Sonic 2 CENSOR         - even - 0
    5. Sonic 2 Beta 4         - even - 0
    6. Sonic 2 Beta 5         - even - 0
    7. Sonic 2 Beta 6         - even - 0
    8. Sonic 2 Beta 7         - even - 0
    9. Sonic 2 Beta 8         - even - 0
    10. Sonic 2 REV00          - odd  - 4E
    11. Sonic 2 REV01          - odd  - 4E
    • Informative Informative x 12
    • Like Like x 2
    • Useful Useful x 2
    • List
  2. Brainulator


    Regular garden-variety member Member
    What a rabbit hole. When I told you about this, I was under the impression that the August 21 and September 14 prototypes lack this byte. I was wrong. Admittedly, I only came across this when trying to make this Sonic 2 Simon Wai disassembly build bit-perfect.

    I wonder if the ones where the padding byte is 0 is just the assembly code being off by 1 given the nature of the subq instruction used to advance the 68K decompressor.