# Brett Kosinski format

Discussion in 'Engineering & Reverse Engineering' started by Sonic Hachelle-Bee, Nov 6, 2004.

1. ### Hayate

Tech Member
>_<

I keep getting you mixed up with Fiz... damn avatars...

Member
Me?

3. ### Sonic Hachelle-Bee

Taking a Sand Shower Tech Member
770
149
43
Lyon, France
Sonic 2 Long Version
There is something to add under the Separate compression. It's the DDDDD value.
The first post has been edited as well.

Separate compression:

This is when there is 01 under the bitfield.
The separate compression uses at least 2 bytes, and sometimes 3 (the last is optional). In a binary form:

NN DC (CC) = NNNN NNNN DDDD DCCC (CCCC CCCC)

NN is another negative value. Unlike IC, this one will tell the game where to read/copy the (uncompressed) data from. Writing FF for NN will read/copy the previous byte only. Writing another value will read/copy the previous bytes this value refers to, until you have CC of them.

DD is again a negative value on 5 bits. This is an addon to NN. Take NN and substract 256 (100 in hex) * |DD+1|. Take the result as your new NN value.

DD = 11111 = -1 --> NN = NN - (256 * |-1+1|) = NN (Do nothing)
DD = 11110 = -2 --> NN = NN - (256 * |-2+1|) = NN - 256
DD = 11101 = -3 --> NN = NN - (256 * |-3+1|) = NN - 256 * 2 = NN - 512
DD = 11100 = -4 --> NN = NN - (256 * |-4+1|) = NN - 256 * 3 = NN - 768
...
DD = 00000 = -32 --> NN = NN - (256 * |-32+1|) = NN - 256 * 31 = NN - 7936

Example:
66 67 FF F8 0D --> NN = -1 and DD = 11111 --> NN = -1, we are starting at 67.
66 67 FE F8 0D --> NN = -2 and DD = 11111 --> NN = -2, we are starting at 66.
...
66 67 FF F0 0D --> NN = -1 and DD = 11110 --> NN = -257, we are starting 257 bytes before.
66 67 FD E7 --> NN = -3 and DD = 11100 --> NN = -771, we are starting 771 bytes before.

CC: Count.
If you have no more than 9 bytes to read/copy, then the last CC byte is useless.

F0 --> Like F8, be careful of the DD value.
F1 --> Copy the read data for 3 bytes, be careful of the DD value.
F2 --> Copy the read data for 4 bytes, be careful of the DD value.
F3 --> Copy the read data for 5 bytes, be careful of the DD value.
F4 --> Copy the read data for 6 bytes, be careful of the DD value.
F5 --> Copy the read data for 7 bytes, be careful of the DD value.
F6 --> Copy the read data for 8 bytes, be careful of the DD value.
F7 --> Copy the read data for 9 bytes, be careful of the DD value.
F9 --> Copy the read data for 3 bytes.
FA --> Copy the read data for 4 bytes.
FB --> Copy the read data for 5 bytes.
FC --> Copy the read data for 6 bytes.
FD --> Copy the read data for 7 bytes.
FE --> Copy the read data for 8 bytes.
FF --> Copy the read data for 9 bytes.

Else, write F8 = 1111 1000 (CCC = 000) for the first count, and use the last byte to actually write your count (-1):
F8 09 --> Copy the read data for 10 bytes.
F8 0A --> Copy the read data for 11 bytes.
...

Writing 00 F8 00 (NN = 00 and both counts are 00's) will end the compressed data.

In our example:
53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...
We have 2 SC: FF F8 0D and FF FD.

67 FF F8 0D = 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 (67 + 0Ex67)
98 FF FD = 98 98 98 98 98 98 98 98 (98 + 07x98)
Another: 12 34 56 78 9A BC DE 10 FA F9 = 12 34 56 78 9A BC DE 10 56 78 9A

4. ### nineko

I am the Holy Cat Tech Member
6,268
469
63
italy
This is useful. :yes:

5. ### Dark Sonic

Member
14,627
1,609
93
Working on my art!
But your not D: I think we have a winner! 3 year bump!

6. ### Ritz

Subhedgehog Member
4,079
101
43
Oh god oh god a bump in a dead forum this disrupts the sanctity of the entire board

7. ### SMTP

Tech Member
I wouldnt even consider it a bump in the archives anyway.

8. ### Dark Sonic

Member
14,627
1,609
93
Working on my art!
I know but still... such a huge bump.

9. ### Sik

Sik is pronounced as "seek", not as "sick". Tech Member
6,719
1
0
being an asshole =P
Taking advantage of the bump: would you like to keep calling it Kosinski or not? Seriously. After examining Allegro's packfiles, Kosinski format turned out to be just a somewhat improved version of LZSS (improved by the fact that small offsets require less bytes). I'm not kidding. Go and check the LZSS file in Allegro sources and read the start comment. It describes how does the packfile compression works. It's way similar.

10. ### drx

mfw Researcher
2,255
348
63
:rolleyes:
It doesn't matter who invented the compression format. If we followed your line of reasoning, all compression formats we don't know the origins of we would have to name '???1', '???2' etc.

The point in naming a compression format after the person who cracked it is that by doing this, you show your gratitude towards the cracker for doing it. I, for one, am very grateful to Kosinski and Nemesis for cracking their formats. If it wasn't for them, we probably wouldn't be where we are as a community. Surely someone else would have cracked the formats later (say, I), but they made it possible first.

We already know what these compressions are really called, anyway.

11. ### nineko

I am the Holy Cat Tech Member
6,268
469
63
italy
I can see your point here, but the music engine is called "SMPS" and not "Saxman".

12. ### Tweaker

Banned
"Saxman" is the name of the music compression used in Sonic 2. So it's not trying to call itself the music format, but rather the compression.

Unless I'm completely missing what you just said.

13. ### Nemesis

Tech Member
LZ77 is a method/theory of compression. LZSS is a painfully obvious addition to LZ77 compression, which barely warrants mentioning IMO.

Kosinski is an implementation of LZ77 compression. There are many others, and they are all similar. PRS compression is also LZ77 based. LZ77 is one of the most popular compression methods. In fact, of the 8 or so sonic-related compression formats I've worked on, I think 6 of them have been LZ77 based. Being based on the same compression theory means they are going to have a lot in common. They are unique implementations however, and there are a lot of implementational details which are not defined in the compression method, such as how the bit tags are embedded, how the offset/count pairs are specified, additional offset/count formats and how they are indicated, the number of bits for the copy count vs the offset in each pair, how the end of file is marked, etc. There's also a lot of careful measurement and testing that goes into choosing precisely how the offset/count pairs are balanced to produce the best compression ratios, and the kind of data you are compressing affects the choice greatly.

When you get down to it, there are only a handful of actual methods for lossless data compression. The high compression formats we use today such as zip, rar, 7z, etc all rely on the same basic compression methods. These more advanced "superformats" simply add header information which allows them to choose between a variety of methods, and select the best, or even combine several methods, to achieve the best compression ratios throughout a file. I could write a paper called "Uber Compression" which defines this compression method in a generic way. It wouldn't do away with the need to keep the distinction between rar and zip however. We name things by their implementation. Theories and methods are only useful to academics.