Andlabs, on 09 June 2012 - 02:09 PM, said:
Is there anyone who speaks Brazilian Portugese who can translate this?
http://gazetadealgol...lz_psg1_e_2.pdf
It's technical documentation for one of the compression formats used by earlier entries in the series; the tool on their website doesn't seem to work (for Nicole at least)? On top of that, the games bundle these files together with seemingly no way to determine where a file starts or ends? Thanks.
SUP
Quote
Information about LZ?? compression used in the game Phantasy Star Generations 1 [PS2]
Every compressed file uses a header like so:
CM(ASCII) = 2 bytes
(Uncompressed size) = 4 bytes
(Compressed size) = 4 bytes
Data start = (until indicated compressed size)
FLAG ID = (I'll soon explain what it is)
Well first of all the compression took a little while to figure out, having to read many documents about other compressions about LZSS (which I couldn't find any that would help me but learned it by myself), LZ77, LZW, LZX, LZRW, LZP, Constant Mixing (because I thought it could've been that because of the CM in the header) and Huffman ("An insult to the human being's brain").
In this document I'll demonstrate how the compression works graphically for an easy understanding about the compression and its logic.
[psg1 boxart]
Look at the structure of the compressed file where its start is CM (434D)
[screenshot]
Now look at the screen showing the file structure
[screenshot]
[yellow] The dear CM. CoMpressed? Contant[sic] Mixing? I don't know...
[red] Data size after decompressing
[tan] "The butter of your bread." I'll explain why later.
[black] And the black is DATA DATA
Well, D81D0000 which inverted, for the fact that the PS2 processor is little-endian, is 0x1DD8 so that should be the compressed size, so let's see if that's so.
[screenshot]
I begin counting from the 11th byte for the fact that the 10 bytes are:
CM = 2 bytes
0C410000 = 4 bytes
D81D0000 = 4 bytes
Obviously: 2+4+4=20 so 11th is the start.
[screenshot]
At the end of the file, I was confused by the following fact:
0x1DD8 != 0x2056
That's more than obvious.
So let's look at the RAM, since the decompressed file is there.
*RAM dumping procedure" (doesn't contain more details in this document).
[screenshot]
Now I'll explain exactly how the compression works.
0B 00 01 00 -> I can't get it anywhere else (Of course, it's at the beginning of the file, isn't it)
Since we have 00 00 00 00 00 00 00 00 repeating so here comes the first
00 50 -> 00 means which byte I'll repeat - 50 breaking this byte in the middle makes 5 since the compression already uses 2 bytes in the file so it'd be useless using 0 or 1, so it'd be 05 + 2 + 1(since in HEX you count from zero) 8 bytes.
24 - no compression
00 00 00 you can "borrow" a bit behind.
03 00 -> being 00 = 00 + 3 (HEX begins to count etc) uses 3 bytes - 03 means to find the closest 00 00 00 right behind.
0B 01 00 00 00 00 00 is right behind so let's borrow it
0F50 -> 50 breaking the byte gives 5 being 5 + 2 + 1(HEX begins counting from zero etc) = 8, so get 8 bytes at 0F bytes behind.
[screenshot]
In LZ the zero counts as well -^ 0F -^
With that I don't even have to explain the other two, right.
But looking at this there's something missing. Where are the FLAGs? I don't see nearly any signs of use of compression or not, so it couldn't create a LZ randomly then.
Remember the "butter of your bread" right above? To save you the work of going up a few pages:
[super tiny screenshot]
Pay attention to the value, then the problem is simpler than you think. Let's count the size of the compressed data, remembering it's from the 11th byte because of the header.
[screenshot]
Stopping here giving the exact data value.
[screenshot]
So what would the other bytes be, doesn't seem to explain how many bytes to skip to the next LZUnknow[sic] compression?
Actually yes it does.
First of all the PS2 is little-endian, so we have to invert the data reading. Let's get an example with only the three bytes D0 41 ED. Use your OS's calculater and set it to scientific mode.
[screenshot]
Change the calculator value to HEXADECIMAL.
Since the PS2 processor and etc. reads the bytes and also bits the other way around be sure of that and let's put the borrowed bytes the other way around D041ED = ED41D0. Now comes the decompression key, change these bytes to binaries.
[screenshot]
Now explaing that:
0 = 1 byte uncompressed
AND
1 = 2 bytes indicating a LZ compression
That way reading right to left we have four zeroes, right? That way they're uncompressed,
0B 00 01 00
0 0 0 0
Quote
A 1 found followed by 0 and two 1:
LZ UNCOMPRESSED LZ
00 50 24 03 00
1 0 1
Quote
Playing a little more:
11000001
LZ LZ LZ
0F 50 05 10 5C 06 00 00 32 13 20
1 1 0 0 0 0 0 1
Quote
That way the bytes 0F 50, 00 32 and 20 02 are LZ compressions indicating each LZ section by binary numbers indicated by 1.
This explains the workings of the compression used in the Phantasy Star Generations 1 and 2 games [PS2].
The game can be modified without the necessity of a recompression, but the ISO would need to be reworked since the uncompressed data would be obviously, bigger than the original.
(no it doesn't make much sense in portuguese either)
This post has been edited by Syniphas: 11 June 2012 - 07:29 PM