Sonic and Sega Retro Message Board: Optimized KosDec and NemDec, considerably faster decompression - Sonic and Sega Retro Message Board

Jump to content

Hey there, Guest!  (Log In · Register) Help
  • 3 Pages +
  • 1
  • 2
  • 3
    Locked
    Locked Forum

Optimized KosDec and NemDec, considerably faster decompression UPDATE: New improved Kosinski and Nemesis compressors by flamewing

#1 User is offline vladikcomper 

Posted 04 November 2013 - 03:22 PM

  • Posts: 182
  • Joined: 13-June 09
  • Gender:Male
  • Location:Russia
  • Project:Sonic Warped
  • Wiki edits:1
A much faster Kosinski decompressor for the 68k

Kosinski is a variant of LZSS compression algorithm, used in various Sega Genesis games. Being much faster than Nemesis algorithm in decompression, this format also provides a fairly better compression ratio on data with frequently repeated sequences of bytes. The decompression code for the 68k found in many Genesis games seems identical. Its decompression routine is known as 'KosDec' in most of Sonic diassemblies. Although written in quite a wise and accurate manner, it still suffers from some optimization errors, both on logic and instructions side of things.

I started looking into the ways of reducing decompression times after once I attempted decompressing big blocks of data on the fly. The process was time-critical and the original decompressor was doing pretty well, but not as well as I wanted it to be: it was kind of balancing on the edge -- I needed just a tiny bit faster processing to get rid of unwanted slowdown. I analyzed the whole decompressor's code back then in order to optimize it and get the best out of it. I figured few tricks which allowed me about 5% faster decompression which was enough back at the time.

Later I did realize that the decompression routine can be optimized even further, but more fundamental changes should be made to it. I then decided to create my own implementation of Kosinski decompressor by applying all the optimization tricks I planned and more. It was doing the same thing, but in a different way, using a different, more efficient code logic.

The result overcame all my expectations, the very first revision of my own Kosinski decompressor was overall 40% faster than the original one. Later on, MarkeyJester and flamewing have suggested a few more ideas, which helped to optimize it even further. Initially, I didn't plan to use some extreme optimization tricks like look-up tables (LUTs) in order to simplify calculations, but flamewing's test proved to give it a good performance increase that I didn't expect, which quite makes up for a larger decompression code.

Now, let me present you, an extremely fast Kosinski decompressor!

http://pastebin.com/EtYf1vz6

The performance increase compared to the original decompressor is unreal. Believe or not, it works more than 1.5 times faster. I've set several tests to measure new compressor's effectiveness. Here's a test involving decompressing the original EHZ art from Sonic 2:

Posted Image Posted Image

Using the new decompressor will not only greatly reduce loading times on any Kosinski-compressed art, but also provide you more freedom with "on the fly" decompression, since you will be able to decompress nearly twice as much data within one TV-frame.

An optimized Nemesis decompressor for the 68k

Unlike Kosinski, Nemesis is an entropy-based compression method. Due to this method's nature, characters in the compressed stream have variable size in bits, making decompression process way more time-consuming: it is required to read compressed stream bit by bit and analyze it to decode characters. Although compression was rather high, it took about 10 times slower than Kosinski to decompress. The latter also provided quite a close compression ratio, which was even higher for many files with frequently repeated byte sequences. That was the reason Sonic Team started slowly migrating from this compression format starting from Sonic 2. Most of the compressed art in Sonic 3K was in Kosinski format already.

I attempted to optimize the Nemesis decompression routine, known as 'NemDec' shortly after my success with the Kosinski decoder. However, even given the fact that Nemesis decompression was terribly slow, their original code was surprisingly well-designed. The algorithm was quite complex and used nearly all the registers processor provided, leaving me almost no gaps for usual optimization tricks.

I've managed to pull just a few optimization tricks, but this didn't make a considerable improvement though. Anyways, a little optimization over the original version was reached, which, considering how long the decompression takes, helps it to save up to 4 frames on decompressing a single level art file in Sonic 1.

http://pastebin.com/cAJSJ0np

The performance increase is about 5% as seen below. This test program decompressed the original GHZ art from Sonic 1. The optimized version works 3 frames faster:

Posted Image Posted Image

* * *

Also, stay tuned! More on the Kosinski and Nemesis compression shall come soon.
This post has been edited by vladikcomper: 04 November 2013 - 07:05 PM

#2 User is offline Cinossu 

Posted 04 November 2013 - 05:34 PM

  • inverted with love~
  • Posts: 2811
  • Joined: 21-June 04
  • Gender:Male
  • Location:London, UK
  • Project:Sonic the Hedgehog Extended Edition
  • Wiki edits:474
Highly impressive, and highly useful. I shall definitely be making use of the two of these, ta very much. :>
I would definitely say document this on the wiki under either a SCHG general tutorial of sorts or under the individual compression method pages.

#3 User is offline flamewing 

Posted 04 November 2013 - 07:03 PM

  • Emerald Hunter
  • Posts: 1138
  • Joined: 11-October 10
  • Gender:Male
  • Location:🇫🇷 France
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
Part two is by me, so here it goes!

Improved compressors and decompressors
When we were working on the above decompressor, vladikcomper mentioned that he thought that KENS' LZSS encoder did not result in optimal sizes. He felt that if you broke some runs earlier, you could maybe gain better compression. I went with that idea, and did some research. As in reading scholarly computer science papers. It did not take long to see that this was true, and there were methods to make optimum compression. They were behind a paywall; and when I did get to them, they were a rather simple report on a recent conference. So I kept digging more and more. I found more articles, one of which mapped LZSS compression to graph theory. If you are interested in knowing more, I added lots of comments explaining what is going on in the source code (see below).

So I set about to writing a perfect compression Kosinski implementation. But rather than do just that, I made a framework for perfect compression of LZSS, which I proceeded to apply to Kosinski, Saxman and Comper (an LZSS-based format created by vladikcomper that has decompression speed as the primary goal).

Some folks know that I have also written a much improved Nemesis encoder too; FraGag made a version in C# (which has some undiagnosed issues) for MainMemory. This compressor has vastly improved compression over old KENS, but it sadly does not have perfect compression -- I have yet to devise a non-brute force way to get past the last remaining barrier to perfect compression. The brute-force solution is not acceptable here because it takes FAR too long.

Anyway: you can get the Windows binaries here; you can get the source code along with that of the S2-SSEdit (here); the tools compile and install by default. These command line tools use a KENS-like interface. I will eventually switch to a KENSSharp-like interface, but for now:
To compress:   <tool> [options] input output
To decompress: <tool> [options] -x <input> <output>
To recompress: <tool> [options] -c <input> <output>

This post has been edited by flamewing: 15 November 2013 - 07:56 AM

#4 User is offline Hitaxas 

Posted 05 November 2013 - 12:15 PM

  • Pokemon and Sonic themed Twich streamer
  • Posts: 1438
  • Joined: 30-September 07
  • Gender:Male
  • Location:Litchfield,CT
  • Project:Becoming a Twitch partner, and getting my life back together
  • Wiki edits:196
In the 2007 version of Sonic 2, which yes, I still use... I cannot get the kosdec routine to work due to the macro. I get this error with @skip\@ telling me it's an invalid symbol. and if I try to change it to anything else it gives me an error saying, error: open macro definition.

#5 User is offline muteKi 

Posted 05 November 2013 - 01:06 PM

  • Fuck it
  • Posts: 7534
  • Joined: 03-March 05
  • Gender:Male
  • Wiki edits:91
Do you have the LZSS/graph theoretic paper still handy? I'd like to take a look at what's going on in that one.

#6 User is offline MainMemory 

Posted 05 November 2013 - 01:22 PM

  • Every day's the same old thing... Same place, different day...
  • Posts: 4214
  • Joined: 14-August 09
  • Gender:Not Telling
  • Project:SonLVL
  • Wiki edits:1,339

View PostHitaxas, on 05 November 2013 - 12:15 PM, said:

In the 2007 version of Sonic 2, which yes, I still use... I cannot get the kosdec routine to work due to the macro. I get this error with @skip\@ telling me it's an invalid symbol. and if I try to change it to anything else it gives me an error saying, error: open macro definition.

This code was written with ASM68K syntax in mind, so two changes need to be made for it to work with AS:
  • The "@skip\@" label needs to be replaced with either "skip" or "+".
  • The "endr" directive under KosDec_largecopy needs to be replaced with "endm" (AS uses endm for both macro and rept structures).

This version works for AS.

#7 User is offline Hitaxas 

Posted 05 November 2013 - 01:36 PM

  • Pokemon and Sonic themed Twich streamer
  • Posts: 1438
  • Joined: 30-September 07
  • Gender:Male
  • Location:Litchfield,CT
  • Project:Becoming a Twitch partner, and getting my life back together
  • Wiki edits:196
I did try the first change, which lead to the Open macro error. number 2 fixed the issue up for me. It didn't show up as an error when building, and it was easy to overlook that.

Thank you, it is building and running perfectly now. :)

#8 User is offline flamewing 

Posted 05 November 2013 - 07:34 PM

  • Emerald Hunter
  • Posts: 1138
  • Joined: 11-October 10
  • Gender:Male
  • Location:🇫🇷 France
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
Hitaxas: since you probably are To anyone that is using this in S3&K, you should also replace the decoder included in the KosQueue for faster KosM decoding — the KosM decoder includes a copy of the Kosisnki decoder so it can detect if it needs to set a bookmark during vint. You probably won't need to include the lookup table again, unless you want to.

muteKi: I was thinking of writing an explanation myself, including nice graphics; I ended up not doing it because vladikcomper dissuaded me with the argument that probably nobody would be interested. If you give me a couple days, I can write it up — the actual articles involved are rather dry and technical.
This post has been edited by flamewing: 05 November 2013 - 08:26 PM

#9 User is offline KingofHarts 

Posted 06 November 2013 - 12:29 AM

  • Posts: 1610
  • Joined: 07-August 10
  • Gender:Male
  • Project:Project Sonic 8x16
  • Wiki edits:1

View PostCinossu, on 04 November 2013 - 05:34 PM, said:

Highly impressive, and highly useful. I shall definitely be making use of the two of these, ta very much. :>
I would definitely say document this on the wiki under either a SCHG general tutorial of sorts or under the individual compression method pages.


I'll definitely get on this today. I wonder... provided either flamewing or vladikomper could walk me through a bit of how it works... if I could modify the .DLL's to share the same improvements. NOTE: I use the OLD KENS.dll's for Triad, as there doesn't seem to be any support for the newer files for Game Maker implementation.
This post has been edited by KingofHarts: 06 November 2013 - 01:00 AM

#10 User is offline flamewing 

Posted 06 November 2013 - 05:34 AM

  • Emerald Hunter
  • Posts: 1138
  • Joined: 11-October 10
  • Gender:Male
  • Location:🇫🇷 France
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
I do have commit privileges on KENS repository -- but when I tried using the "new" DLLs (which already have the improved Nemesis and Enigma encoders, but not the improved Kosinski and Saxman ones) with SonMapEd, it failed badly. I am guessing that kram1024 didn't want to/didn't know how to preserve binary compatibility. I am not sure I know how myself.

#11 User is offline KingofHarts 

Posted 06 November 2013 - 10:03 AM

  • Posts: 1610
  • Joined: 07-August 10
  • Gender:Male
  • Project:Project Sonic 8x16
  • Wiki edits:1
My issue with a DLL is that it will need to have functionality with Game Maker... I could look into studying a .dll via IDA Pro and try to figure it out, but I wouldn't know where to begin. NEver touched a .dll before.

#12 User is offline flamewing 

Posted 07 November 2013 - 02:42 PM

  • Emerald Hunter
  • Posts: 1138
  • Joined: 11-October 10
  • Gender:Male
  • Location:🇫🇷 France
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
As promised, here is the LZSS->Graph write-up.

First, some background/terminology on LZSS: for each byte in the input stream to compress, LZSS has a sliding window, which is composed of a number of bytes that came before that byte, and a look-ahead buffer, which is composed of the byte in question and a number of bytes that come after it. For Kosinski, the sliding window is composed of at most the 8192 bytes that came before the current byte, and the lookahead buffer is composed of the 256 bytes that start from the current byte.

LZSS compresses by looking at matches in the sliding window (starting in the sliding window) and the lookahead buffer; if there is no match (the very first byte in the file, for example), the value is encoded as a literal byte; otherwise, the bytes in the lookahead that had a contiguous match in the sliding window are encoded as a (offset,length) pair that indicates how far behind the byte the match was (offset) and how long it is (length). There are descriptor fields in the file that are a set of bit flags; these descriptor fields marks which bytes are literal bytes and which ones are (offset,length) pairs. In general, there will be multiple matches from a given byte with different length and/or offset.

Kosinski has 3 different match encodings of different sizes; shorter matches closer to the current byte take up less space, while longer matches compress better; there is a trade-off in which one to use at any given time. Saxman has an additional mode that allows it to generate a sequence of zeros without out of nowhere, which can reduce file size in some cases.

Now for some primer in graphs. This is a basic overview; more can be seen here. A graph is a set of vertices or nodes, plus a set of edges that connect these nodes. There are several categories of graphs: undirected versus directed; with cycles vs without cycles (acyclic); weighted vs unweighted. The image in the spoiler tags at the end of the post has illustrations of each kind; here is a direct link to it. Graphs are very simple structures; but they are also very powerful. They are everywhere in computer science, from the routing of packages on the Internet to compilers.

But moving on: how do we go from bytes in a file to a graph and how does it help with LZSS? To answer this, we will build the LZSS graph of a file (I don't think anyone ever called it like this in the papers I read; it just feels appropriate). To do that, we need to build the set of nodes and the set of edges connecting them. The LZSS graph is a directed weighted graph. It will have more properties, but I will talk about them below.

To build the nodes:
  • Add one node for each byte in the file; label this node with the (0-based) position where the byte appears in the file. So the first byte in the file becomes node 0.
  • Add one additional node for the end-of-file; I will refer to it as "the EOF node", but it should be labeled with the length of the file.
As an implementation detail, these nodes don't need to be actually created; we will only need the edges for LZSS.

To build the edges:
  • EOF has no outgoing edges; so it is excluded from the steps below except as a target for an edge.
  • For each nodes U, add an edge from U to U+1; this node's weight is the cost, in bits, of encoding the corresponding byte as a literal match.
  • For each node U, add an edge from U to a node V > U if there is an LZSS match that encodes all bytes from U to V-1. Since there are multiple LZSS matches that cover this range of bytes, pick the one that takes the least amount of bits to encode the match; this cost is the weight of this edge.
  • For Saxman, if the current byte and the next several bytes (at least 2) are zeros, then add an additional edge between these two nodes with the weight of encoding this match (or replace the cost of the existing edge if it had a higher cost).
To sum it up, two nodes U, V with U < V have an edge between them if you have a match for all (V - U) bytes starting in U and ending in V-1; that is, each edge starts at the first byte of a compressed sequence of bytes and ends at the next byte that needs to be compressed. The weight of this edge is the smallest size, in bits, that is needed to encode this compressed sequence. As far as implementations go, the (offset,length) pair of each edge should also be saved as attributes of each edge so they can be used later.

This gives us a graph that (a) has no self-loops; (b) has no edges from a node to an earlier node; © only has positive edge costs. (a) and (b) means we have no cycles, and they combine to make this a directed acyclic graph (DAG) which is already topologically sorted.

Now to put this to use for LZSS: given a path from node 0 to EOF, if we sum the costs of each edge in this path, we will have the size of the compressed file*. So we just need to find the shortest path from node 0 to EOF and we have the lowest file size*. Finding the shortest path in a topologically-sorted DAG is trivial, and can be done even as the graph is built (I don't know why I didn't do it in the version I released...).

* Well, almost; this gives size in bits; not only we need to round it up to bytes, but there is an additional thorny issue on some LZSS formats, such as Kosinski. Kosinski suffers from what I call "early descriptor loading": whenever Kosinski has used up the last bit in a descriptor field, it immediately loads a new descriptor field (Saxman and Comper only do that when they need a new bit). Thus, the end-of-compression token in Kosinski (which needs 2 bits in the descriptor field) may end up causing a whole new descriptor field (the whole 16 bits) to be printed, sometimes with only 1 bit used, other times with 0 bits used. My encoder also handles this case.

Hopefully, this is clear enough; if not, please ask away and I will improve the explanation.

Edit: something I forgot to mention: I thought of generating the graph for a small file as it was being Kosinski-compressed to show here; it was so large that it would take an hour to upload, and that is for the smallest non-trivial file I tried.

Spoiler

This post has been edited by flamewing: 07 November 2013 - 05:28 PM

#13 User is offline vladikcomper 

Posted 07 November 2013 - 05:45 PM

  • Posts: 182
  • Joined: 13-June 09
  • Gender:Male
  • Location:Russia
  • Project:Sonic Warped
  • Wiki edits:1
I should've posted this earlier along with flamewing's first post in this topic, but unfortunately real life events got me busy. As you might notice, the newly released compressors by flamewing also included a new compression format, Comper. So, here it comes...

Comper Compression

Comper is a new compression format, developed by me while I was working on a small proof-of-concept a while ago. My goal was to provide the best compression ratio possible along with faster decompression speeds that no other algorithms offered before. This way, large amounts of data can be loaded "on the fly" within one frame, giving no slowdown. Therefore, the compressed data can be operated as easily as uncompressed.

Here's a little demonstration to show this format's potential. I've compressed the original GHZ art from Sonic 1 with Nemesis and Comper algorithms and compared both compression effectiveness and decompression times. The recently released flamewing's compressors were used to guarantee the best compression ratio for Nemesis so far.

Uncompressed art: 25.9 KB
Nemesis-compressed art: 10.3 KB
Comper-compressed art: 12.6 KB


Posted Image Posted Image

Resulted in just 2.3 KB larger file, Comper-compressed art was decompressed 10 times faster than Nemesis, which further prooves that Nemesis has a terrible compression/speed ratio. In some rare cases Comper may even overcome Nemesis's compression ratio, just like Kosinski. But in most of the cases, obviously, Nemesis has more efficient compression due to algorithm complexity and times it takes to decompress. While Comper compression had decompression speed as its primary goal, however, I was surprised how close it could get to complex and "heavy" compressions like Nemesis, like in my particular example with GHZ art.

The decompression speed Comper has is so high that you may even be able to decompress small tilesets in-game on the fly, as if it was uncompressed art. The decompressor's code is also small and compact just like the format itself -- it's pretty simple, but effective.

Decompressor's code (ASM68K version): http://pastebin.com/aMrfKTbz
Decompressor's code (AS version): http://pastebin.com/fVp8d5Fr

(Yeah, and sorry for not doing AS versions of KosDec and NemDec, I completely forgot. =X)

#14 User is offline Puto 

Posted 08 November 2013 - 08:59 AM

  • Shin'ichi Kudō, detective.
  • Posts: 2012
  • Joined: 31-July 05
  • Gender:Male
  • Location:Portugal, Oeiras
  • Project:Part of Team Megamix, but haven't done any actual work in ages.
  • Wiki edits:51
Curious, would this thing be feasible to compress Sonic's art (or any other DPLCable art)?

#15 User is offline flamewing 

Posted 08 November 2013 - 10:03 AM

  • Emerald Hunter
  • Posts: 1138
  • Joined: 11-October 10
  • Gender:Male
  • Location:🇫🇷 France
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
Unless you can fit all the decompressed art in main RAM, it would not be suitable for that: you can't start decompressing a part of the file until you have decompressed everything before it, which is not very good for DPLCs. If you were to break the file into the contiguous chunks of art for each DPLC frame, the compression ratio would probably be bad as there is not enough space to find good matches.

For DPLCs, a better choice would probably be Sik's UFTC format -- or would be, except FileDen seems to have gone the way of the dodo. Its compression ratio is much worse, but it allows decompression of specific tiles.

  • 3 Pages +
  • 1
  • 2
  • 3
    Locked
    Locked Forum

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users