don't click here

Find The Original Kosinski Compressor Source Code Has Been Released

Discussion in 'Engineering & Reverse Engineering' started by Clownacy, May 24, 2025.

  1. Clownacy

    Clownacy

    Tech Member
    1,177
    891
    93
    Last year, I discovered that the Kosinski compression format is actually LZEXE, which was used for compressing DOS executables back in the 90s and the late 80s. Its developer catalogues three versions on his website: v0.90, v0.91, and v0.91e. While only binaries of v0.91 and v0.91e can be found on the website, v0.90 can be found mirrored on various other websites.

    I got in touch with LZEXE's developer, Fabrice Bellard, and he was able to release LZEXE's source code, untouched since 1990! To maximise performance, the compression logic was written in x86 assembly, while its frontend was written in Pascal. This particular source code appears to be for v0.91.

    Back in 2021, I made my own Kosinski compressor which produced identical data to what could be found in the Mega Drive Sonic games. At the time, I noticed that it did not accurately reproduce the Mega CD BIOS's compressed Sub-CPU payload data. The inaccuracies were so extensive that it appeared that the BIOS's data was compressed with a different tool to the Sonic games. Notably, the compressor which was used for the Sonic games suffered from a number of bugs and shortcomings, causing the compressed data to less efficient than it should have been. The Mega CD BIOS developers may have used a different version of the compressor, which lacked these bugs, or which had additional bugs.

    With this in mind, the source code which has been released may not be for the exact compressor which was used by the Sonic games, though it could be modified to function identically to it. Since the compression logic was written in assembly, it should be simple enough to disassemble the compressor executables and compare them to the source code. Devon did the heavy-lifting of extracting and unpacking the core logic, which can be found here.

    With that, we now have the source code of two of the four 'KENS' format compressors - Kosinski and Saxman! Unfortunately, I do not have much hope of ever finding the original compressors for, let alone the source code of, the remaining two formats - Enigma and Nemesis - due to them evidently being custom formats which were designed specifically for the Mega Drive, likely meaning that the compressors and their source code never left the hands of Sega (Enigma encodes plane map data, operating on 16-bit words and specifically acknowledging the separation of bits of the tile's index from its X/Y flip, palette line, and priority; meanwhile Nemesis encodes tiles, operating on nibbles and bunching data into groups of 32 bytes (8 x 8 4-bit nibbles).

    I would add this information to the wiki, but I am still receiving '403 Forbidden' errors which prevent me from even looking at the wiki, let alone adding to it.

    EDIT: Fabrice has now re-released the source code under the MIT licence, allowing it to be freely used in other projects!
     
    Last edited: May 24, 2025
    • Like Like x 14
    • Informative Informative x 2
    • List
  2. Chimes

    Chimes

    The One SSG-EG Maniac Member
    1,067
    740
    93
    Am I dreaming
    Is this real
    This feels like something SR would make for April Fools
    This is galactically important. Holy shit. Nice!
     
    • Agree Agree x 3
    • Like Like x 2
    • List
  3. Brainulator

    Brainulator

    Regular garden-variety member Member
    Nice! Do you think anything like this will be incorporated into the disassemblies?
     
  4. Clownacy

    Clownacy

    Tech Member
    1,177
    891
    93
    Maybe someday, since it would be nice to be able to swap the compressor for a better one to automatically make all of the compressed data smaller.

    Personally, I would rather only do so once all compression formats have accurate compressors, otherwise the migration to build-time compression would be incomplete. We have accurate compressors for Kosinski, Nemesis, and Saxman, but Enigma is proving to be a huge headache for me: I have a mostly-accurate compressor on GitHub, but I am currently stumped by how the original compressor decided the 'incremental copy word' value in the header - I swear, it is picked at random!

    The disassemblies would likely use my accurate Kosinski compressor rather than the original Kosinski compressor, since Bellard's code is written in x86 assembly, making it unportable to other platforms like ARM-based Macs and the Raspberry Pi. My compressor is written in C, so it lacks this limitation. I do need to compare the original x86 assembly to my compressor to see if I got any details incorrect though. It is satisfying that, after all these years, I can finally see if I was correct about how the original compressor's code was written!
     
    • Like Like x 2
    • Informative Informative x 1
    • List