don't click here

Nemesis Decompression NemBCT_ShortCode Prefix Free

Discussion in 'Engineering & Reverse Engineering' started by Cokie, Jun 13, 2023.

  1. Cokie

    Cokie

    C. Okie Member
    76
    22
    8
    The Nemesis decompression subroutine NemBCT_ShortCode takes care of codes that arent 8 bits in length . It writes a word with the L

    length of code , the repeat and pallete index. In the form 0LRP.


    It then afterwards continues to fill multiple words with the same data in proceeding words im guessing . This im guessing is all the indexes that are calculated for longer codes witj same prefix in order to make it prefix free? So it dies this for the number of permutations of 8 bit codes that have this prefix?


    For instance, NemBCT_ShortCode , for the Nemesis Title TM art , fills the code length in bits (4) , repeat (0) and pallete index (0) in FFAB80-FFAB9E as


    04000400040004000400040004000400

    04000400040004000400040004000400'


    If the words data after the first are just repeats for a longer code with the same prefix, why then does NEMDEC_ProcessCompressedData access multiple/ several of these words and not just the first one?


    Also , is the part of the compressed data that describes the code (ie length bits repeat palette index_ called the dictionary? Any help on overall understanding and correcting my terminologies would be immensely appreciated
     
    Last edited: Jun 13, 2023
  2. MarkeyJester

    MarkeyJester

    Original, No substitute Resident Jester
    2,208
    437
    63
    Japan
    To increase entropy decoding speed.

    The table is an entropy index table (...yes it's called a "dictionary" if you will), but it's accessed via huffman encoding. This means the binary encoding length is variable and unpredictable, you can have many branching paths of a variety of lengths.

    This makes it significantly complicated to decode the binary path for and obtain the right entry in the table, to further complicate things, the tree structure isn't fixed, it can be structured differently per compressed data. Ordinarily you would check one bit at a time and follow the branch correctly in correlation to the compressed data's tree structure. It's not impossible to write code which can do this, but it will be incredibly slow to process, and would be quite messy and complicated.

    The entropy table which is created is less of an index table, and more of a look up table.

    We'll do a dummy example, something really simple, let's pretend our entropy index table only has 3 entries:

    Let's say the binary code 0 accesses the first entry (and let's say this first entry is $20).
    Let's say the binary code 10 accesses the second entry (and let's say this second entry is $69).
    Let's say the binary code 11 accesses the third entry (and let's say this third entry is $FF).

    The entropy index table is $20, $69, and $FF, but when you unpack this tree it'll look something like this:

    Code (Text):
    1. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    2. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    3. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    4. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    5. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    6. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    7. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    8. $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20 $20
    9. $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69
    10. $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69
    11. $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69
    12. $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69 $69
    13. $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF
    14. $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF
    15. $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF
    16. $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF $FF
    Let's pretend our bitfield will be 01110101.

    The first bit "0" already should read $20. Instead of isolating that 1 bit on its own by shifting it onto the carry, or AND masking it, or wasting CPU time... We just quickly and simply reference it from the table above (no tricks). 01110101 is $75, the $75th byte in the table above is $20.

    Let's pretend our bitfield is 00101001, again, the first bit is "0", we should read $20. Again, just reference directly from the table, no AND, no shift, 00101001 is $29, the $29th byte in the table above is... $20.

    It doesn't matter what these lower 7 bits are, the upper bit just needs to be 0. Anything from 00000000 ($00) to 01111111 ($7F) will always access byte $20 from that table.

    Let's pretend our bitfield is 11010010, the first two bits are "11", we should read $FF. 11010010 is $D2, the $D2th byte in our table is... $FF. If the bitfield were 11110011, the same thing would happen, the first two bits "11" need to read $FF. 11110011 is $F3. The $F3rd byte is $FF.

    It doesn't matter what these lower 6 bits are, the upper two bits just need to be 11. Anything from 11000000 ($C0) to 11111111 ($FF) will always access byte $FF from that table.

    Of course, because the huffman branches in the bitfield are of varying length, you need a way of shifting the bits up the right amount afterwards. Well... this is why each entry in the table is a WORD and not a BYTE. So our table will actually be:

    Code (Text):
    1. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    2. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    3. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    4. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    5. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    6. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    7. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    8. $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120 $0120
    9. $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269
    10. $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269
    11. $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269
    12. $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269 $0269
    13. $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF
    14. $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF
    15. $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF
    16. $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF $02FF
    The lower byte will be the entropy tables' byte we need to read, and the upper byte will be the number of bits to shift forwards after reading (or rather the length of that particular huffman branch).

    It is nothing more than an elaborate way of cutting corners to save on CPU time.
     
    Last edited: Jun 13, 2023
  3. Cokie

    Cokie

    C. Okie Member
    76
    22
    8
    Thank you very much this is incredible . Sorry I didn't get back sooner . Im still sorting through it. Still have some questions if you dont mind . For now, why is it more of a lookup table than an index table? I read https://en.wikipedia.org/wiki/Lookup_table but it compared it in contrast to a HASH.
     
  4. MarkeyJester

    MarkeyJester

    Original, No substitute Resident Jester
    2,208
    437
    63
    Japan
    Wikipedia often overcomplicates explanations, you're overthinking this.
     
  5. Cokie

    Cokie

    C. Okie Member
    76
    22
    8
    I noticed Wikipedia overcomplicates. defines codes from the translating a thought to words , to using semaphore for ships , to then a mathematical definition in Information theory of the source alphabet set {a,b,c} to a target set {0,1} where it is mapped.

    So do I not really need all the math definitions perspective of definitions and the math equations of things like Huffman Encoding?

    What's a better less complicated way that will still give all the needed definitions and needed foundations? Youtube?

    I understand stuff but fear I like some things completely . Like my definitions and big picture of the entire Nem Cooperson. And what some things really are in a broad sense outside of compression like encoding and decoding ( which mean a broader meaning in say decoding a carrier signal of a FM radio or encoding decoding in encryption ) .