don't click here

LZ01 compression woes

Discussion in 'Technical Discussion' started by NickW, Mar 29, 2008.

  1. NickW

    NickW

    Member
    Sega surely loves their weird compression formats :P Anyway, this time I am trying to figure out their LZ01 compression format, and I am lost. I assume it is based off LZ77. This is the header (thanks drx):

    0x00 - 0x03: 4C5A 3031 (LZ01)
    0x02 - 0x07: The compressed file size (4 bytes)
    0x08 - 0x0B: The uncompressed file size (4 bytes)
    0x10: The start of the data

    So the header is 0x10 bytes long. The sliding window length is 2110 bytes (why this number, I have no idea.)


    The good thing is that there are uncompressed files in the DS version that are LZ01 compressed in the PS2 version (why would the PS2 versions files need to be compressed anyway?), so at least I know what the uncompressed data is. So, here are 2 files, which have a uncompressed version and compressed version:

    drop3ez.dat (LZ01 compressed)
    drop4ez.dat (LZ01 compressed) (There are a few bytes that are different between the compressed & uncompressed version, but they occur before byte 2048.)

    Anybody have any ideas on this compression format?
     
  2. drx

    drx

    mfw Researcher
    2,254
    350
    63
    :rolleyes:
    First off, the header looks like that:

    0x00-0x03: LZ01 ascii
    0x04-0x07: Compressed size
    0x08-0x0b: Uncompressed size
    0x0c-0x0f: ?
    0x10 - start of the compressed data

    Now, the way the compressed data is organized, it starts with a control byte. The control byte is comprised of 8 bits (of course). Starting from the rightmost bit:

    1 - pure, uncompressed copy (copy one byte from the source buffer, at current pointer, to destination buffer, at its current pointer)
    0 - compressed

    E.g. if we have FF 01 02 03 04 05 06 07 08, then FF is the control byte, and all its bits are 1, so we copy 01 02 03 04 05 06 07 08 directly to the decompression buffer.

    If we have 5F 03 00 0F 00 38 EB F0 D2 EB F0. The control byte is 01011111, so we copy the first five bytes directly (03 00 0F 00 38). Then we encounter a compressed flag (which is two bytes -- EB F0). Then we copy D2 directly, then a compression flag again (EB F0).

    Now, I don't have time to decipher what the compression flags do, but if you don't figure it out soon, I'll try to have a go at it.

    Hope that helps.
     
  3. NickW

    NickW

    Member
    That does help a lot, at least it is known how it stores the data. And now I would know how to make the "poor mans" LZ01 compressed file.

    I still don't get how those compression flags work. I noticed that EB F0 seems to refer to 00 00 00, so that may mean something.
     
  4. NickW

    NickW

    Member
    Ok, first I fixed the links in the first post.

    Next, I took a look at the file over the past few days, and it definitely uses a variation of the LZSS compression method. So, EBF0 seems to be telling the decompressor "Goto offset 1 in the decompressed data and repeat for 3 bytes" or something similar to that, although it still confuses me. Help plz.
     
  5. drx

    drx

    mfw Researcher
    2,254
    350
    63
    :rolleyes:
    OK, I fully cracked it and coded a decompressor (that works), I'll put it up in a few hours (I'm in a hurry)
     
  6. drx

    drx

    mfw Researcher
    2,254
    350
    63
    :rolleyes:
    Ok, the decompressor is attached to this post.

    The way you use it, you drag & drop the file you want to decompress on the .exe, and it will decompress it and create a new file with the extension .out.

    The way the compression words work is this:

    xxyz

    offset = y*256+xx + 18
    count = z+3

    The decompressor copies (count) bytes from the buffer at (offset)

    Note: y*256+xx is a signed, 12-bit number. You *have* to sign extend it to whatever you're using to make it work.

    The best way to sign extend it is to do this:

    (x^0x800)-0x800

    (where ^ is XOR)

    ;)

    View attachment 2240
     
  7. NickW

    NickW

    Member
    Thanks drx, that decompresser really helps. Though it seems to have an error with some larger files.

    This file for example:
    LZ01 compressed file

    The original looks like this:
    Original File

    But your program decompresses it as this:
    LZ01 decompressed file.

    This error starts at 0x8AC (offset 2220). I would have showed you this file before, but I wasn't sure if Sega fixed the errors with this file (which they didn't).
     
  8. drx

    drx

    mfw Researcher
    2,254
    350
    63
    :rolleyes:
    Yeah, I don't know why that happens, and I won't have time to check this for a while.

    The dictionary/buffer seems to be only 0x800 or 0x1000 wide, using a signed pointer.
     
  9. NickW

    NickW

    Member
    I'll try to fix it then, although it would be nice if you can fix it. It seems like the window size is most likely 0x800.

    EDIT: Either Java hates me, or I suck at coding. Some numbers are correctly decompressed, but others don't. I'll just wait till you have the time to fix it.
     
  10. NickW

    NickW

    Member
    Screw what I had in this post before. I'll just wait till you have the time to fix it, as I absolutely cannot make a working decompressor.

    I should also mention that a few bytes are different in the new compressed & decompressed files I posted. They happen before offset 0x800 though. (And I do not believe the one error in th fever chain after that offset.)
     
  11. NickW

    NickW

    Member
    I want to say that although it's been a month since the last post, I still haven't forgotten about this and still need help with the decompresser.

    EDIT: Nevermind, I got it to decompress correctly. I just need to put in the sliding window. The problem was that the XX variable wasn't converted to a signed byte.