don't click here

Utility Konami Compression tools

Discussion in 'Engineering & Reverse Engineering' started by vladikcomper, May 11, 2020.

  1. vladikcomper

    vladikcomper

    Tech Member
    205
    134
    43
    Sonic Warped
    tldr; Just another compression format completely cracked, and a fairly good one. So, one more option to potentially pop into your hack or homebrew project, just because~
    This post releases my own cross-platform compressor & decompressor for the Konami compression format as well as the original decompression code for the M68K.
    This is a cross-post from SSRG.



    Oh my, I haven't released anything in a long while! This one isn't anything big though, just a small compression tool for yet another format. I've actually spent some time thinking whether it's even worth releasing. But if some of you guys find it even a tiny bit useful, I'd be more than happy.
    Moreover, I'd be glad to reveal even more research on various Genesis games and their compression formats, but only if this particular release picks a certain degree of interest (I have a bunch of old stuff lying around, but after all it'd require quite a lot of time and effort to get everything polished and sorted~)


    How did this happen? (A short historical notice)
    So, back in 2013, by someone's request, I did a quick disassembly of Contra Hard Corps, a Genesis hit developed by Konami in 1994. The main goal was to figure out basic art and mappings editing. It wasn't long before I discovered samples of the compressed data in ROM and the decompressor code itself. I then quickly cracked and documented the format itself. All that was left to do was to develop a good compressor for it. It wasn't easy for me 7 years ago, but I did wrote my own compressor and it did... well, compress. The compression ratio was sligtly worse than that of the original compressor by Konami, but it did the job needed to get art swapping to work.

    Back then I decided not to release it anywhere (while I could), because, frankly, I thought it wasn't any good.

    Things suddenly changed a few weeks ago, when I decided to revisit some of my older tools and attempted to compile them under GNU/Linux (has been primary OS for quite some time). A spark suddenly hit me, as I decided to improve the code style: the mystery of underperformant compression still sat at the back of my mind after all those years. I then attempted my investigation of which part of my original implementation was responsible for the problem, so I could improve it.

    And that brings us here. As you might've guessed, I completely overhauled my algorithm and wrote the compressor from scratch. To my absolute surprise, this time the new compressor had a better compression ratio than the Konami's original. So, with the job well done, I thought this implementation may at least be worth some attention~

    Funny thing is, just before I went ahead with releasing it, I had found out that the same format had already been cracked a few years after my original research from 2013. I found another compression toolset released in 2015 on romhacking.net by Lab313, and it turned out Konami eventually came up with other variations of the this format for its later games (which were also covered by those tools). However, this toolset has Windows frontend only and I haven't found any actual documentation on the formats publicly available (which is the gap, I'm going to fill now). While that toolset is more likely aimed for art hacking Konami games, my release mainly targets developers and researchers.

    For the sake of simplicity and compatibility, I decided to re-use the previously established name of the format, LZKN1 (as was defined in the aforementioned toolset; my implementation originally refered to it just as "Konami compression").


    How good the format actually is? (Compression methods comparison)

    I believe, the first thing you might ask when presented with a new format is "how well does it compare to our favorite Kosinski and Nemesis?"

    To answer your question, I performed a quick comparison by uncompressing Sonic 2 level art from the stock disassembly (the art used Kosinski compression originally), and tried recompressing it into Nemesis and Kosinski format using the KENSC, as well as to Konami compression format using my own compressor (lzkn):

    [​IMG]

    NOTICE: The original Kosinski compressor Sonic Team used performed worse than KENSC, hence recompressing the same art results in ~1.8% of ROM space saved.

    As you may notice, the Konami Compression, LZKN1, comes pretty close compression-ratio wise. And Nemesis art, even when compressed by the best compressor to date (the original compressor Sonic Team used did way worse) is clearly out of competition (given the time and complexity to decompress it; which is probably why Sonic Team replaced in with Kosinski in the later games).


    What are the advantages exactly? (Brief format documentation)

    Now, another question's probably arose: so, its compression is slightly worse than Kosinski's, but it may benefit in other aspects, right?

    You're not mistaken. This format is extremely simple and efficient to decompress (when compared with Kosinski). It doesn't use variable-length codes to describe compression flags and all of its data elements have a size of exactly 1 byte.

    The decompression code for 68k is also elegantly simple and merely fits just 38 instructions (see the code in the next section).
    For comparison, my own compression format, Comper, with the emphasis on the extreme simplicity takes just 19 instructions in comparison, and the original Kosinski consists of 66 instructions. While comparing number of instructions (on condition that the loops are not unrolled) is nowhere near an accurate measure, this may at least give you some idea of these formats' decompression complexity in comparison with each other.


    Format description

    The compressed data starts with a header, followed by the compressed stream:
    Code (Text):
    1. <data> = <header> <compressed stream>
    The 2-byte header merely encodes uncompressed data size, in bytes. Size is a 16-bit Big-endian integer, meaning up to 64 kb worth of data may be compressed.
    Code (Text):
    1. <header> = <uncompressed size>
    The compressed stream consists of description fields, followed by several data blocks.

    Description fields are 8-bit bitfields that encode whether the following data blocks are uncompressed bytes or commands:
    • If bit == 0, the next data block to read is considered an uncompressed byte. One byte is read from the compressed stream. This byte is appended to the uncompressed stream directly;
    • If bit == 1, the next data block to read is considered a command. One or two bytes are read from the compressed stream, depending on the first bits of the command itself. The command usually initiate copying of the previously uncompressed data, if this data is going to be repeated.
    The description field bits are processed right to left (least significant bits first). When all bits are exhausted (and after the last corresponding data block is fetched), the next byte in the compressed stream is read and considered a new description field.

    Thus, the compressed stream may look something like this:
    Code (Text):
    1. <compressed stream> =
    2. <description field 1> <data block 1> <data block 2> ... <data block 8>
    3. <description field 2> <data block 9> <data block 10> ...
    If data block represents a command, the following formats are possible, depending on the 2 most significant bits in the first byte read (formats below er represented as bitfields for clarity):
    Code (Text):
    1.     %00011111           Stop flag
    2.  
    3.     %0ddnnnnn dddddddd  Uncompressed stream copy (Mode 1)
    4.                         dd..dddddddd = Displacement (0..-1023)
    5.                         nnnnn = Number of bytes to copy (3..33)
    6.  
    7.     %10nndddd           Uncompressed stream copy (Mode 2)
    8.                         nn = Number of bytes to copy (2..5)
    9.                         dddd = Displacement (0..-15)
    10.            
    11.     %11nnnnnn           Compressed stream copy
    12.                         nnnnnn = Number of bytes to copy (8..71)
    As seen above, if command with a value of $1F (%00011111) is met, the decompression stops immediately. Each compressed stream should include the stop command at the end.


    Alright, how to use it? (Compression tools and the 68k decompressor)

    If you're really want to give it a try and you know well what you're doing, go ahead and grab yourself a 68k decompressor and the compression tool for it. That would be a good start.

    The original decompressor for the M68K
    Code (Text):
    1.  
    2. ; ---------------------------------------------------------------
    3. ; Konami's LZSS variant 1 (LZKN1) decompressor
    4. ; ---------------------------------------------------------------
    5. ; INPUT:
    6. ;       a5      Input buffer (compressed data location)
    7. ;       a6      Output buffer (decompressed data location)
    8. ;
    9. ; USES:
    10. ;       d0-d2, a0, a5-a6
    11. ; ---------------------------------------------------------------
    12.  
    13. KonDec:
    14.         move.w  (a5)+,d0        ; read uncompressed data size   // NOTICE: not used
    15.         bra.s   @InitDecomp
    16.  
    17. ; ---------------------------------------------------------------
    18. ; Alternative (unused) decompressor entry point
    19. ; Skips uncompressed data size at the start of the buffer
    20. ;
    21. ; NOTICE: This size is not used during decompression anyways.
    22. ; ---------------------------------------------------------------
    23.  
    24. KonDec2:
    25.         addq.l  #2,a5           ; skip data size in compressed stream
    26. ; ---------------------------------------------------------------
    27.  
    28. @InitDecomp:
    29.         lea     (@MainLoop).l,a0
    30.         moveq   #0,d7           ; ignore the following DBF
    31.  
    32. @MainLoop:
    33.         dbf     d7,@RunDecoding ; if bits in decription field remain, branch
    34.         moveq   #7,d7           ; set repeat count to 8
    35.         move.b  (a5)+,d1        ; fetch a new decription field from compressed stream
    36.  
    37. @RunDecoding:
    38.         lsr.w   #1,d1           ; shift a bit from the description bitfield
    39.         bcs.w   @DecodeFlag     ; if bit=1, treat current byte as decompression flag
    40.         move.b  (a5)+,(a6)+     ; if bit=0, treat current byte as raw data
    41.         jmp     (a0)            ; back to @MainLoop
    42. ; ---------------------------------------------------------------
    43.  
    44. @DecodeFlag:
    45.         moveq   #0,d0
    46.         move.b  (a5)+,d0        ; read flag from a compressed stream
    47.         bmi.w   @Mode10or11     ; if bit 7 is set, branch
    48.         cmpi.b  #$1F,d0
    49.         beq.w   @QuitDecomp     ; if flag is $1F, branch
    50.         move.l  d0,d2           ; d2 = %00000000 0ddnnnnn
    51.         lsl.w   #3,d0           ; d0 = %000000dd nnnnn000
    52.         move.b  (a5)+,d0        ; d0 = Displacement (0..1023)
    53.         andi.w  #$1F,d2         ; d2 = %00000000 000nnnnn
    54.         addq.w  #2,d2           ; d2 = Repeat count (2..33)
    55.         jmp     (@UncCopyMode).l
    56. ; ---------------------------------------------------------------
    57.  
    58. @Mode10or11:
    59.         btst    #6,d0
    60.         bne.w   @CompCopyMode   ; if bits 7 and 6 are set, branch
    61.         move.l  d0,d2           ; d2 = %00000000 10nndddd
    62.         lsr.w   #4,d2           ; d2 = %00000000 000010nn
    63.         subq.w  #7,d2           ; d2 = Repeat count (1..4)
    64.         andi.w  #$F,d0          ; d0 = Displacement (0..15)
    65.  
    66. @UncCopyMode:
    67.         neg.w   d0              ; negate displacement
    68.  
    69. @UncCopyLoop:
    70.         move.b  (a6,d0.w),(a6)+ ; self-copy block of uncompressed stream
    71.         dbf     d2,@UncCopyLoop ; repeat
    72.         jmp     (a0)            ; back to @MainLoop
    73. ; ---------------------------------------------------------------
    74.  
    75. @CompCopyMode:
    76.         subi.b  #$B9,d0         ; d0 = Repeat count (7..70)
    77.  
    78. @CompCopyLoop:
    79.         move.b  (a5)+,(a6)+     ; copy uncompressed byte
    80.         dbf     d0,@CompCopyLoop; repeat
    81.         jmp     (a0)            ; back to @MainLoop
    82. ; ---------------------------------------------------------------
    83.  
    84. @QuitDecomp:
    85.         rts
    86.  


    The compression tool

    lzkn in a tool I created to work with this compression format. It's cross-platform, fully open-source and comes with comprehensive documentation.
    Its repository is available on Github: https://github.com/vladikcomper/konami-compression-tools

    As mentioned earlier, my implementation of the compressor is confirmed to have even better compression ratio than the original compressor used by Konami. For that reason, my tool also supports recompression (decompressing already compressed art and compressing it again), just like KENSC.


    Downloads
    For Linux and MacOS users, it's recommended to build the tool from the source code. For build and installation instructions, please see project's documentation: https://github.com/vladikcomper/kon...uilding-from-the-source-code-and-installation


    Basic usage (Windows users)

    If you're an ordinary Windows user, you may be more comfortable with compressing files via drag&drop on the executable.
    1. Just put lzkn.exe in a folder where it's easy to get.
    2. Drag&drop a file (say file.bin) you want to compress on the lzkn.exe.
    3. That's it, the compressed file should appear in the same folder (say, file.bin.lzkn1 in our example)
    WARNING! Due to limitations of the format itself, only files less than 64kb can be compressed successfully!


    Advanced usage (command line arguments)
    Code (Text):
    1. lzkn [-c|-d|-r] input_path [output_path]
    The first optional argument, if present, selects operation mode:
    • -c = Compress <input_path>;
    • -d = Decompress <input_path>;
    • -r = Recompress <input_path> (decompress and compress again).
    If flag is ommited, compression mode is assumed.

    If [output_path] is not specified, it's set as follows:
    • .lzkn1 extension is appended to the <input_path> in compression mode;
    • .unc extension is appended to the <input_path> in decompression mode;
    • In recompression mode, the same filepath is used.
    Examples:

    The following command compresses file.bin to file.bin.lzkn1:
    Code (Text):
    1. lzkn file.bin
    In decompression mode, the same file.bin.lzkn1 may be decompressed to file.bin.lzkn1.unc by default:
    Code (Text):
    1. lzkn -d file.bin.lzkn1
    Recompress old-compressed.bin to itself:
    Code (Text):
    1. lzkn -r old-compressed.bin
    Compress uncompressed.bin to compressed.bin:
    Code (Text):
    1. lzkn -c uncompressed.bin compressed.bin
    Please note that -c is the default mode and may be omitted.