don't click here

Sonic & Knuckles Collection C port

Discussion in 'Engineering & Reverse Engineering' started by BenoitRen, Jan 11, 2024.

  1. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    After more research into global variables, I got a clearer picture and have decompiled/ported some of the graphics code. However, I know little about the Mega Drive's VDP and Mega Drive graphics programming in general, so I'm hoping the following technical dive will make sense to those who do have such experience.

    g_convertedHscrollbuff is an array of unsigned longs that's, in this context, filled using the following code:
    Code (C):
    1.  
    2. for (int i = 0; i < 224; ++i) {
    3.   g_convertedHscrollbuff[i] = -*g_pHscrollbuff2 & 0x3FF;
    4.   g_pHscrollbuff2 += 2;
    5. }
    g_pHscrollbuff2 initially points to the second word of the horizontal scroll buffer. Which means that, after this loop, g_convertedHscrollbuff contains a converted version of each double word's second word.

    After this, g_pVramPlane is set to point to either Plane A or B's location inside the Mega Drive's VRAM.

    Next, a function call is done for each value contained inside g_convertedHscrollbuff for half of the vertical scroll buffer's contents:
    Code (C):
    1. void FUN_00415039() {
    2.   g_pSystemRam = &g_systemRam;
    3.   g_pConvertedHscrollbuff = g_convertedHscrollbuff;
    4.   for (int hscrollIndex = 0; hscrollIndex < 224; ++hscrollIndex) {
    5.     for (int vscrollIndex = 0; vscrollIndex < 20; ++vscrollIndex) {
    6.       FUN_0041507f(hscrollIndex, vscrollIndex, *g_pConvertedHscrollbuff);
    7.     }
    8.     ++g_pConvertedHscrollbuff;
    9.   }
    10. }
    The previous function was optimised ASM, but simple enough to figure out. This next function, however, was horror:
    Code (C):
    1. void FUN_0041507f(int hscrollIndex, int vscrollIndex, unsigned long hscrollValue) {
    2.   unsigned long unknown1 = g_vscrollbuffCopy[vscrollIndex] + hscrollIndex;
    3.   unsigned long unknown2 = unknown1 & 7;
    4.   hscrollValue += vscrollIndex << 4;
    5.   int index = ((unknown1 >> 3) & 0x1F) << 7;
    6.   unsigned short* pVramPlane = &g_pVramPlane[index];
    7.   unsigned char* pSystemRam = g_pSystemRam;
    8.   index = hscrollValue >> 3 & 0x3F;
    9.   unsigned short planeValue = pVramPlane[index];
    10.   unsigned char hscrollValueLowerBits = hscrollValue & 7;
    11.   if (hscrollValueLowerBits == 0) {
    12.     pSystemRam = FUN_0040f62c(pSystemRam, planeValue, unknown2);
    13.   }
    14.   else {
    15.     pSystemRam = FUN_0040f778(pSystemRam, planeValue, unknown2, hscrollValueLowerBits);
    16.   }
    17.   ++index;
    18.   index &= 0x3F;
    19.   planeValue = pVramPlane[index];
    20.   pSystemRam = FUN_0040f62c(pSystemRam, planeValue, unknown2);
    21.   ++index;
    22.   index &= 0x3F;
    23.   if (hscrollValueLowerBits != 0) {
    24.     planeValue = pVramPlane[index];
    25.     pSystemRam = FUN_0040f810(pSystemRam, planeValue, unknown2, hscrollValueLowerBits);
    26.   }
    27.   g_pSystemRam += 16;
    28. }
    g_vscrollbuffCopy is a copy of a part of the Mega Drive's vertical scroll buffer. It, along with hscrollValue (a value from g_pConvertedHscrollbuff), seem to be the stars of this function. Depending on the three lower bits of hscrollValue, pixels from an area in video memory selected using g_vscrollbuffCopy are converted in a certain way. Thankfully, those three functions were easier to port:
    Code (C):
    1. unsigned char* FUN_0040f62c(unsigned char* pSystemRam, unsigned short planeValue, unsigned char unknown2) {
    2.   unsigned long* pVram = &g_mdVram[(planeValue & 0x7FF) << 5];
    3.   unsigned long vramValue;
    4.   if (planeValue & 0x1000) {
    5.     vramValue = pVram[7 - unknown2];
    6.   }
    7.   else {
    8.     vramValue = pVram[unknown2];
    9.   }
    10.   if (planeValue & 0x800) {
    11.     // rotate left
    12.     vramValue = vramValue << 8 | vramValue >> 24;
    13.     unsigned char pPixelConv = &pixelConvTable[planeValue >> 9 & 0x70];
    14.  
    15.     for (int i = 0; i < 4; ++i) {
    16.       pSystemRam[1] = pPixelConv[(vramValue & 0xFF) >> 4];
    17.       pSystemRam[0] = pPixelConv[vramValue & 0xF];
    18.       // rotate left
    19.       vramValue = vramValue << 8 | vramValue >> 24;
    20.       pSystemRam += 2;
    21.     }
    22.   }
    23.   else {
    24.     unsigned char pPixelConv = &pixelConvTable[planeValue >> 9 & 0x70];
    25.  
    26.     for (int i = 0; i < 4; ++i) {
    27.       pSystemRam[0] = pPixelConv[(vramValue & 0xFF) >> 4];
    28.       pSystemRam[1] = pPixelConv[vramValue & 0xF];
    29.       vramValue >>= 8;
    30.       pSystemRam += 2;
    31.     }
    32.   }
    33.  
    34.   return pSystemRam;
    35. }
    36.  
    37.  
    38. unsigned char* FUN_0040f778(unsigned char* pSystemRam, unsigned short planeValue, unsigned char unknown2, unsigned char hscrollValueLowerBits) {
    39.   unsigned long* pVram = &g_mdVram[(planeValue & 0x7FF) << 5];
    40.   unsigned long vramValue;
    41.   if (planeValue & 0x1000) {
    42.     vramValue = pVram[7 - unknown2];
    43.   }
    44.   else {
    45.     vramValue = pVram[unknown2];
    46.   }
    47.   vramValue = (vramValue & 0xFF000000) >> 24
    48.             | (vramValue & 0x00FF0000) >> 8
    49.             | (vramValue & 0x0000FF00) << 8
    50.             | (vramValue & 0x000000FF) << 24;
    51.   unsigned rotateBitCnt = hscrollValueLowerBits << 2;
    52.   if (planeValue & 0x800) {
    53.     // rotate right
    54.     vramValue = vramValue >> rotateBitCnt | vramValue << 32 - rotateBitCnt;
    55.     unsigned char pPixelConv = &pixelConvTable[planeValue >> 9 & 0x70];
    56.     unsigned char cnt = 8 - hscrollValueLowerBits;
    57.  
    58.     do {
    59.       *pSystemRam++ = pPixelConv[vramValue & 0xF];
    60.       // rotate right
    61.       vramValue = vramValue >> 4 | (vramValue & 0xF) << 28;
    62.     } while (--cnt != 0);
    63.   }
    64.   else {
    65.     // rotate left
    66.     vramValue = vramValue << rotateBitCnt | vramValue >> 32 - rotateBitCnt;
    67.     unsigned char pPixelConv = &pixelConvTable[planeValue >> 9 & 0x70];
    68.     unsigned char cnt = 8 - hscrollValueLowerBits;
    69.  
    70.     do {
    71.       *pSystemRam++ = pPixelConv[vramValue & 0xF];
    72.       // rotate left
    73.       vramValue = vramValue << 4 | vramValue >> 28;
    74.     } while (--cnt != 0);
    75.   }
    76.  
    77.   return pSystemRam;
    78. }
    79.  
    80.  
    81. unsigned char* FUN_0040f810(unsigned char* pSystemRam, unsigned short planeValue, unsigned char unknown2, unsigned char hscrollValueLowerBits) {
    82.   unsigned long* pVram = &g_mdVram[(planeValue & 0x7FF) << 5];
    83.   unsigned long vramValue;
    84.   if (planeValue & 0x800) {
    85.     vramValue = pVram[7 - unknown2];
    86.   }
    87.   else {
    88.     vramValue = pVram[unknown2];
    89.   }
    90.   vramValue = (vramValue & 0xFF000000) >> 24
    91.             | (vramValue & 0x00FF0000) >> 8
    92.             | (vramValue & 0x0000FF00) << 8
    93.             | (vramValue & 0x000000FF) << 24;
    94.   if (planeValue & 0x800) {
    95.     unsigned char pPixelConv = &pixelConvTable[planeValue >> 9 & 0x70];
    96.  
    97.     do {
    98.       *pSystemRam++ = pPixelConv[vramValue & 0xF];
    99.       // rotate right
    100.       vramValue = vramValue >> 4 | vramValue << 28;
    101.     } while (--hscrollValueLowerBits != 0);
    102.   }
    103.   else {
    104.     unsigned char pPixelConv = &pixelConvTable[planeValue >> 9 & 0x70];
    105.  
    106.     do {
    107.       // rotate left
    108.       vramValue = vramValue << 4 | vramValue >> 28;
    109.       *pSystemRam++ = pPixelConv[vramValue & 0xF];
    110.     } while (--hscrollValueLowerBits != 0);
    111.   }
    112.  
    113.   return pSystemRam;
    114. }
    And this is, according to MainMemory's labelling, the pixel conversion table:
    Code (C):
    1. /* 0040f5ac */ unsigned char pixelConvTable[128] = {
    2.    16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,  31,
    3.    16,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,
    4.    16,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,
    5.    16,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,  79,
    6.    16, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,
    7.    16, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
    8.    16, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
    9.    16, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207
    10. };
    I know it's a lot, but I'm hoping that someone will look at this and tell me it makes sense to them, because I'm kind of lost.

    EDIT: curled some loops that were obviously unrolled, and cleaned up rotation code.
    EDIT2: fixed a shift direction
     
    Last edited: Aug 15, 2024
  2. MainMemory

    MainMemory

    Has-Been Modder Tech Member
    4,819
    408
    63
    Myself
    If you aren't familiar with the format of MD plane nametables, it's described here: https://md.railgun.works/index.php?title=VDP#Nametables
    On the MD, vertical scrolling for planes can be set to full screen (the first value controls the scrolling of the entire screen) or per-column. These functions are set up for per-column mode, where the screen is cut into 16 pixel sections that can be scrolled independently.
    The loop at the top iterates over each column of vscroll, thus the inner function is responsible for filling 16 pixels horizontally. The lower bits of the hscroll value are being checked to see if the scanline is not aligned to an exact tile boundary (8 pixels), and if so, it draws two partial tiles and one full tile, else it draws two full tiles.
     
  3. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    I've since read an introduction to Mega Drive graphics and after some searching found the format of the plane nametable values, which allowed me to clarify some code. Despite that, I still didn't have an idea of the purpose of those functions. So, thanks for the clarification. :)

    I'm almost done porting the ASM that handles the planes, and will extract that code to a separate file and push it. There's so much of it, and a lot of code is duplicated (either entirely or with small but important differences), probably for performance reasons. The thing had to run on the first Pentiums, after all.

    One does wonder what approach was best. For Sonic CD, they manually converted the ASM to C, and ported the game's graphics. For this collection they might have thought it'd be quicker to machine translate the ASM, but then they were obligated to port the Mega Drive's VDP.
     
  4. MainMemory

    MainMemory

    Has-Been Modder Tech Member
    4,819
    408
    63
    Myself
    Sorry if I didn't make it clear, these functions are drawing the background plane's tiles into the pixel buffer used for the screen, one line at a time, and one 16-pixel column at a time.
     
  5. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    I had a feeling that what I called g_mdSystemRam wasn't actually the system's RAM...

    Relatedly, do you know why it's converting the horizontal scroll buffer's values?
    Code (C):
    1. g_convertedHscrollbuff[i] = -*g_pHscrollbuff2 & 0x3FF;
    What this seems to do is bitflip the value (in a non-portable manner). Why does it do this?
     
  6. MainMemory

    MainMemory

    Has-Been Modder Tech Member
    4,819
    408
    63
    Myself
    It's not bitflipping, it's negating and confining the result to a certain range. I think normally, the horizontal scroll values on MD inform how far to the right each line is, while the code here needs them to be scrolling to the left instead, as that can be used as a simple index into the tile array (if a line is scrolled 8 pixels left, you simply skip one tile to the right when rendering).
     
  7. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    Here it is! The VDP code for background planes!
    Okay, no, it's not exactly bitflipping; it's calculating the complement (example: 1000 becomes 24). It's not just negating because the AND operation removes the bit sign.
     
  8. MainMemory

    MainMemory

    Has-Been Modder Tech Member
    4,819
    408
    63
    Myself
    Either way, the end result is that the scrolling values are flipped around to be in the opposite direction.
    Also, neat fact about the pixel buffer that I just remembered now: whenever a tile is called for with the priority flag set, the pixels written to the buffer will have the high bit set as a flag. Future pixels drawn in that spot without the priority flag set will be ignored.
     
  9. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    You must be talking about this piece of code that's present in multiple functions:
    Code (C):
    1. for (int i = 0; i < 8; ++i) {
    2.   if (!(*p_pixelbuffer & 0x80) && (pixels & 0xF)) {
    3.     *p_pixelbuffer = (pixels & 0xF) + offset;
    4.   }
    5.   ++p_pixelbuffer;
    6. }
    EDIT: Removed a line too many, and it bothered me.
     
    Last edited: Aug 17, 2024
  10. MainMemory

    MainMemory

    Has-Been Modder Tech Member
    4,819
    408
    63
    Myself
    Yes.
     
  11. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    I've just pushed the sprite blitting functions! The porting was this quick because there's essentially only one blitting function; all the others are a variation. There are also two helper routines related to masked sprites, but they're small and almost identical.

    The function at address 0040336e is the last roadblock to completing the graphics pipeline. It seems to be responsible for updating the screen surface buffers.
     
  12. Blastfrog

    Blastfrog

    See ya starside. Member
    I’m a total noob, but my gut feeling tells me perhaps a hybrid approach would be ideal. Start with machine ASM translation, but don’t bother emulating the MD VDP and handle that in a more platform native manner, probably in hand-written x86 ASM. Compilers have gotten better since then, but no C code no matter how efficient the compiler may be will ever surpass the performance of carefully written ASM.
     
  13. MainMemory

    MainMemory

    Has-Been Modder Tech Member
    4,819
    408
    63
    Myself
    You really didn't need to be writing code for a PC game in ASM in 1997.
     
  14. Devon

    Devon

    pfp by @litchui on Twitter Tech Member
    1,580
    1,958
    93
    your mom
    Tell that to Chris Sawyer when he programmed Roller Coaster Tycoon :V (it does make sense, it was a complex game for its time)
     
  15. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    I've just pushed the screen surface blitting functions! Before that, I finished decompiling the rest of the graphics code.

    Next step: getting all of this compiled together and pray I don't open a rift in the space-time continuum in doing so.
     
  16. MainMemory

    MainMemory

    Has-Been Modder Tech Member
    4,819
    408
    63
    Myself
    All this talk of graphics functions is reminding me of how I wanted to expand the game's graphical capabilities with mods (allow sprite graphics to pull from anywhere in memory, using separate VRAM blocks for sprites vs planes, bitmap sprites, extended palettes). I suppose those things will be easier now with all the code decompiled.
     
  17. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    With the power of source code, we can do anything! :D

    I was hoping to get my proof of concept running today, but it was quite the ordeal to even get it to compile.

    Code::Blocks has issues with the Windows resource files, and it won't even tell me why. The error messages don't say anything more than "syntax error". Whoever coded that should be deeply ashamed of themselves. MSVC did report an error with a virtual key constant, and was happy with them after I added an #include I forgot. I tried using the compiled resource files that MSVC generated with Code::Blocks, but it didn't recognise the format. I ended up commenting every CONTROL definition just to get it over with. This hair-pulling experience cost me at least an hour.

    Next were complaints from the linker about multiple definitions for a set of global variables. Right, I forgot to add some guards. No, don't put them in the implementation file, but in the header file! The compiler still complained. After some thinking I figured out that the issue was because I have two files for my global variables: globals.c and globals.cpp. Both of those get compiled to globals.o, which doesn't work. I changed the filename to globalscpp.cpp.

    A bunch of functions weren't found by the linker despite them being included in my project. Luckily I quickly found the cause: C++ functions can't see C functions and vice-versa. I opted to compile everything as C++, which fixed that issue, but then my C files were treated as C++. Despite what people would have you believe, C and the C subset of C++ are not the same! I had to make lots of silly modifications just to get that code to compile under the changed conditions (it compiled fine as C code).

    I now have an executable, but it crashes while retrieving the controller configuration from the Windows registry due to a segmentation fault. I don't know why yet.
     
    Last edited: Aug 20, 2024
  18. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    I'll spare you the details. I got something to show up:

    upload_2024-8-20_18-20-25.png

    Yeah, a black screen. But when I go full screen:

    screenshot.jpg
    Clearly, Sonic and Tails have been redecorating.

    It's not easy to see, but this isn't some random mess. Those are the background tiles with SONIC and MILES, but incorrectly placed. Also, they animate, but too fast.

    Going back to windowed mode causes a segmentation fault more often than not.

    On the plus side, the music works just fine.

    EDIT: I've commited all the rest of the code that got me to this point. It was about time!
     
    Last edited: Aug 20, 2024
  19. BenoitRen

    BenoitRen

    Tech Member
    956
    580
    93
    I just compared the output of my Enigma decompression code and clownacy's, and they don't match. There must be a bug in my implementation. Damn it. :(

    EDIT: Found one of them, it was due an operator precedence error. But now the output looks even more wrong.
     
    Last edited: Aug 20, 2024
    • Informative Informative x 1
    • List
  20. Devon

    Devon

    pfp by @litchui on Twitter Tech Member
    1,580
    1,958
    93
    your mom
    Code (Text):
    1.    wk.data = *p_source++ << 8;

    Should be
    Code (Text):
    1.    wk.data = *(unsigned short*)p_source++

    Or if this value is stored as big endian and you want to make sure it is guaranteed to be read as such (I have a feeling it might be, judging by the 68000 code. I've not taken a look at how S&KC handles data, so bear with me)
    Code (Text):
    1.    wk.data = (*p_source++ << 8) | *p_source++

    As per this 68000 code:
    Code (ASM):
    1.     move.b  (a0)+,d5      ; store seventh byte
    2.     asl.w   #8,d5         ; shift up by a byte
    3.     move.b  (a0)+,d5      ; store eighth byte in lower register byte

    If the Enigma data is exactly how it is in the original Genesis game, then you should also apply the split byte reading to this as well. If it's not, and these 2 values are stored as little endian, then you can disregard this (although, it probably should be changed if you want the code to be portable to big-endian platforms)
    Code (Text):
    1.   unsigned short incremental_copy = *(unsigned short*)p_source++ + offset;
    2.   unsigned short literal_copy = *(unsigned short*)p_source++ + offset;

    As for minor things, I did see this:
    Code (Text):
    1.    offset += 0x8000;

    It should probably be
    Code (Text):
    1.    offset |= 0x8000;

    As per this 68000 code:
    Code (ASM):
    1.     btst   d6,d5                  ; is the bit set?
    2.     beq.s  .skippriority          ; if not, branch
    3.     ori.w  #high_priority,d3      ; set high priority bit
    4.  
    5. .skippriority:

    Same deal with the X and Y flip bits.
    Code (ASM):
    1.     add.b   d1,d1
    2.     bcc.s   .skipyflip    ; if d4 was < $10
    3.     subq.w  #1,d6         ; get next bit number
    4.     btst    d6,d5
    5.     beq.s   .skipyflip
    6.     ori.w   #flip_y,d3    ; set Y-flip bit
    7.  
    8. .skipyflip:
    9.     add.b   d1,d1
    10.     bcc.s   .skipxflip    ; if d4 was < 8
    11.     subq.w  #1,d6
    12.     btst    d6,d5
    13.     beq.s   .skipxflip
    14.     ori.w   #flip_x,d3    ; set X-flip bit
    15.  
    16. .skipxflip:

    Code (Text):
    1.    p_wk->data |= *p_wk->p_source++;

    Should probably be
    Code (Text):
    1.     p_wk->data = (p_wk->data & 0xFF00) | *p_wk->p_source;

    Seeing as the lower byte is straight up overwritten, and doesn't advance source pointer, as per this 68000 code:
    Code (ASM):
    1.     move.b    (a0),d5        ; get next byte

    Code (Text):
    1.     p_wk->data |= *p_wk->p_source++;
    2.     p_wk->data <<= 8;
    3.     p_wk->data |= *p_wk->p_source++;

    Should be
    Code (Text):
    1.     p_wk->data = *p_wk->p_source++;
    2.     p_wk->data <<= 8;
    3.     p_wk->data |= *p_wk->p_source++;

    As per this 68000 code
    Code (ASM):
    1.     move.b   (a0)+,d5    ; get current byte, move onto next byte
    2.     lsl.w    #8,d5       ; shift up by a byte
    3.     move.b   (a0)+,d5    ; store next byte in lower register byte

    EDIT: Added a bunch more suggestions
     
    Last edited: Aug 21, 2024