don't click here

Sonic CD Gems Collection Linker Disassembly Dumps

Discussion in 'Engineering & Reverse Engineering' started by Devon, Jun 12, 2022.

  1. Devon

    Devon

    I'm a loser, baby, so why don't you kill me? Tech Member
    1,248
    1,419
    93
    your mom
    Double posting, because I think it's warranted. As I said in the previous post, I found a tool called dwarf2cpp, that parses DWARF v1 data and generates C/C++ skeletons from them, basically allowing for easier analysis of variables, structures, and function prototypes and their local variables, while also setting up the folder structure of the source code. No actual code is decompiled, it just dumps those things. Should be a good resource for a possible decompilation in the future maybe?

    Download
    GitHub Repository

    Some samples of what it generated:
    [​IMG]
    [​IMG]
    [​IMG]
    [​IMG]

    Here's a sample decompilation I did with this information (note: structure names and constant names had to be made up):
    Code (Text):
    1. void action(void) {
    2.     actwkt *pActwk;
    3.     i32 i;
    4.  
    5.     pActwk = actwk;
    6.     for (i = 0; i < ACTWK_SLOTS; ++i) {
    7.         if (pActwk->actno != 0) {
    8.             act_tbl[pActwk->actno](pActwk);
    9.         }
    10.         ++pActwk;
    11.     }
    12. }
    13.  
    14. void speedset(actwkt *pActwk) {
    15.     i32u xpos;
    16.     i32u ypos;
    17.     i16u spd;
    18.  
    19.     ypos = pActwk->yposi;
    20.     xpos = pActwk->xposi;
    21.     spd = pActwk->xspeed;
    22.     xpos.l += (spd.w << 8);
    23.  
    24.     spd = pActwk->yspeed;
    25.     if (!(pActwk->actfree[PLAYCTRL] & 8)) {
    26.         if (spd.w >= 0 ||
    27.             (!(pActwk->actfree[PLAYCTRL] & 2) ||
    28.             spd.w >= -0x800)) {
    29.             if (!(pActwk->actfree[PLAYCTRL] & 4)) {
    30.                 pActwk->yspeed.w += 0x38;
    31.             }
    32.         }
    33.     }
    34.     if (pActwk->yspeed.w >= 0) {
    35.         if (pActwk->yspeed.w >= 0x1000) {
    36.             pActwk->yspeed.w = 0x1000;
    37.         }
    38.     }
    39.     ypos.l += spd.w << 8;
    40.  
    41.     pActwk->xposi.l = xpos.l;
    42.     pActwk->yposi.l = ypos.l;
    43. }
    44.  
    45. void speedset2(actwkt *pActwk) {
    46.     i32u xpos;
    47.     i32u ypos;
    48.     i32 spd;
    49.     i32 actwkno;
    50.     i16 d1;
    51.  
    52.     xpos = pActwk->xposi;
    53.     ypos = pActwk->yposi;
    54.  
    55.     spd = pActwk->xspeed.w;
    56.     if (pActwk->cddat & 8) {
    57.         actwkno = pActwk->actfree[PLAYRIDE];
    58.         if (actwk[actwkno].actno == 0x1E) {
    59.             d1 = -0x100;
    60.             if (!(pActwk->cddat & 1)) {
    61.                 d1 = -d1;
    62.             }
    63.             spd += d1;
    64.         }
    65.     }
    66.     spd <<= 8;
    67.     xpos.l += spd;
    68.  
    69.     spd = pActwk->yspeed.w;
    70.     spd <<= 8;
    71.     ypos.l += spd;
    72.     pActwk->xposi = xpos;
    73.     pActwk->yposi = ypos;
    74. }
     
    Last edited: Sep 17, 2022
    • Like Like x 5
    • Useful Useful x 2
    • List
  2. BenoitRen

    BenoitRen

    Tech Member
    409
    182
    43
    Hi Devon. Could you please explain how you went about creating your sample decompilation? I can't find good references for reading PS2 MIPS assembly. I've only had experience with reading NES 6502 assembly until now.

    action() is a global function, so it appears in multiple files. So far I've figured out the first instructions do general housekeeping to update the stack and return addresses.

    The first actual instruction seems to be:
    Code (Text):
    1. lui          a0,hi(actwk+257)
    According to a reference, lui means "To load a constant into the upper half of a word.". So here the high part of actwk is loaded into the high part of a0 (function argument register)? What does +257 mean, here?

    Can you please get me started? Thanks!
     
  3. Devon

    Devon

    I'm a loser, baby, so why don't you kill me? Tech Member
    1,248
    1,419
    93
    your mom
    To be honest, I have a very very bare grasp on MIPS assembly. I just used Ghidra to provide a base decompilation of the function, and then attempted to follow the assembly code to make it more accurate to what was actually programmed. The line numbers associated with addresses dumped via dwarf2cpp also helped with grouping instructions.
     
  4. Cooljerk

    Cooljerk

    NotEqual Tech, Inc - VR & Game Dev Oldbie
    4,505
    201
    43
    Not familiar with PS2 MIPS, but in most assembly languages, that would be the offset to move to after loading the constant address actwk. Think of Actwk like an array, with Actwk being the base offset of the array. It's saying the constant to load is the 257th element of the array in Actwk.

    Looking further at the code, it seems Actwk is a structure containing many values byte packed into chunks. So that is the offset of some smaller value, ie xposi or yposi or something else in Actwk.
     
  5. BenoitRen

    BenoitRen

    Tech Member
    409
    182
    43
    I've found the actual action function using the referenced memory address. I'm stuck on the following instructions:
    Code (Text):
    1. lui          s0,hi(actwk+256)
    2. addiu        s0,s0,lo(actwk+13696)
    I've determined that the struct type that actwk points to is 74 bytes, which is a strange size. +256 would mean the fourth element, member cddat. Then +13696 would mean the start of struct number 185, which doesn't make sense as the array of actwk structs has 128 elements.

    I must be doing something wrong. No idea how that translates to:
    Code (Text):
    1. pActwk = actwk;
    EDIT: Looking back at the symbol table, actwk's size is 8704. Divided by 128, that'd mean that each struct is 68 bytes. When I recalculate, I count 66 bytes. There must be two bytes of padding.

    EDIT2: Yup, there were two padding bytes:
    Code (Text):
    1.  
    2. struct anon0
    3. {
    4.   unsigned char actno;      // 1
    5.   unsigned char actflg;     // 1
    6.   unsigned short sproffset; // 2
    7.   _anon3** patbase;         // 4
    8.   _anon5 xposi;             // 4
    9.   _anon5 yposi;             // 4
    10.   _anon9 xspeed;            // 2
    11.   _anon9 yspeed;            // 2
    12.   _anon9 mspeed;            // 2
    13.   unsigned char sprhsize;   // 1
    14.   unsigned char sprvsize;   // 1
    15.   unsigned char sprhs;      // 1
    16.   unsigned char sprpri;     // 1
    17.   unsigned char patno;      // 1
    18.   (padding byte)
    19.   _anon9 mstno;             // 2
    20.   unsigned char patcnt;     // 1
    21.   unsigned char pattim;     // 1
    22.   unsigned char pattimm;    // 1
    23.   unsigned char colino;     // 1
    24.   unsigned char colicnt;    // 1
    25.   unsigned char cddat;      // 1
    26.   unsigned char cdsts;      // 1
    27.   unsigned char r_no0;      // 1
    28.   unsigned char r_no1;      // 1
    29.   (padding byte)
    30.   _anon9 direc;             // 2
    31.   _anon9 userflag;          // 2
    32.   unsigned char dummy[2];   // 2
    33.   unsigned char actfree[22];// 22
    34. };
    Finally, my decompilation of the action function:
    Code (Text):
    1. void action()
    2. {
    3.    _anon0* pActwk = actwk;
    4.    for (int i = 0; i < 128; ++i) {
    5.     if (pActwk->actno != 0) {
    6.       act_tbl[(pActwk->actno - 1) << 2](pActwk);
    7.     }
    8.     ++pActwk;
    9.   }
    10. }
     
    Last edited: Apr 9, 2023
  6. BenoitRen

    BenoitRen

    Tech Member
    409
    182
    43
    I've made lots of progress the past week, reading the disassembly assisted by Ghidra. I've decompiled almost all the code that's meant to go in the action.c file. There's just one function left, but it uses an undefined global variable called @113 and an existing stack variable that's probably set by the calling function. The calling function is not part of action.c, so I'm moving on to other files until I have more data.

    For some reason Ghidra is bugged regarding global variables. It seems to have multiplied the addresses by 2, which not only makes it point to incorrect addresses, but also makes them fall outside of the memory range. This makes it impossible to name or assign types to them. I have no idea of how to resolve this.
     
  7. Brainulator

    Brainulator

    Regular garden-variety member Member
    I want to ask: how did you all manage to get the DWARF v1 data properly interpreted by Ghidra? From what I've seen, Ghidra only natively supports DWARF versions 2 and above, and while an extension exists to handle this sort of thing, but to my knowledge, it does not work on newer versions of Ghidra (or at the very least, I cannot recompile it with current tools as described here).
     
  8. BenoitRen

    BenoitRen

    Tech Member
    409
    182
    43
    I entered the type data manually into Ghidra.
     
  9. BenoitRen

    BenoitRen

    Tech Member
    409
    182
    43
    This DWARFv1 debug info is a gift that keeps on giving.

    Yesterday I was decompiling a function with many local variables. Until now, I've always figured out what name to link to which register based on context. But I was thinking, shouldn't such information be part of the debug symbols?

    So, I opened up a file and searched for the function name. Right below it, there's this:
    Code (Text):
    1.  
    2. 0004a75e:<45>TAG_formal_parameter
    3. 0004a764    AT_sibling(0004a78b)
    4. 0004a76a    AT_mod_u_d_type(<5>MOD_pointer_to (000494cc))
    5. 0004a773    AT_location(<11> OP_BASEREG(29) OP_CONST(192) OP_ADD)
    6. 0004a782    AT_name(pActwk)
    7. 0004a78b:<24>TAG_lexical_block
    8. 0004a791    AT_sibling(0004aa47)
    9. 0004a797    AT_low_pc(0101f6f0)
    10. 0004a79d    AT_high_pc(0101faa0)
    11. 0004a7a3:<47>TAG_local_variable
    12. 0004a7a9    AT_sibling(0004a7d2)
    13. 0004a7af    AT_mod_u_d_type(<5>MOD_pointer_to (000494cc))
    14. 0004a7b8    AT_location(<11> OP_BASEREG(29) OP_CONST(180) OP_ADD)
    15. 0004a7c7    AT_name(pActwk_w)
    16. 0004a7d2:<42>TAG_local_variable
    17. 0004a7d8    AT_sibling(0004a7fc)
    18. 0004a7de    AT_mod_u_d_type(<5>MOD_pointer_to (000494cc))
    19. 0004a7e7    AT_location(<5> OP_REG(30))
    20. 0004a7f0    AT_name(pPlayerwk)
    21. 0004a7fc:<45>TAG_local_variable
    22. 0004a802    AT_sibling(0004a829)
    23. 0004a808    AT_mod_fund_type(<4>MOD_pointer_to MOD_pointer_to FT_unsigned_char)
    24. 0004a810    AT_location(<11> OP_BASEREG(29) OP_CONST(176) OP_ADD)
    25. 0004a81f    AT_name(pTbltbl)
    26. 0004a829:<36>TAG_local_variable
    27. 0004a82f    AT_sibling(0004a84d)
    28. 0004a835    AT_mod_fund_type(<3>MOD_pointer_to FT_unsigned_char)
    29. 0004a83c    AT_location(<5> OP_REG(18))
    30. 0004a845    AT_name(pTbla)
    31. 0004a84d:<33>TAG_local_variable
    32. 0004a853    AT_sibling(0004a86e)
    33. 0004a859    AT_fund_type(FT_char)
    34. 0004a85d    AT_location(<5> OP_REG(23))
    35. 0004a866    AT_name(patno)
    36. 0004a86e:<36>TAG_local_variable
    37. 0004a874    AT_sibling(0004a892)
    38. 0004a87a    AT_fund_type(FT_char)
    39. 0004a87e    AT_location(<5> OP_REG(22))
    40. 0004a887    AT_name(userflag)
    41. 0004a892:<34>TAG_local_variable
    42. 0004a898    AT_sibling(0004a8b4)
    43. 0004a89e    AT_fund_type(FT_signed_short)
    44. 0004a8a2    AT_location(<5> OP_REG(16))
    45. 0004a8ab    AT_name(time_x)
    46. 0004a8b4:<34>TAG_local_variable
    47. 0004a8ba    AT_sibling(0004a8d6)
    48. 0004a8c0    AT_fund_type(FT_signed_short)
    49. 0004a8c4    AT_location(<5> OP_REG(21))
    50. 0004a8cd    AT_name(time_y)
    51. 0004a8d6:<34>TAG_local_variable
    52. 0004a8dc    AT_sibling(0004a8f8)
    53. 0004a8e2    AT_fund_type(FT_signed_short)
    54. 0004a8e6    AT_location(<5> OP_REG(20))
    55. 0004a8ef    AT_name(posi_x)
    56. 0004a8f8:<34>TAG_local_variable
    57. 0004a8fe    AT_sibling(0004a91a)
    58. 0004a904    AT_fund_type(FT_signed_short)
    59. 0004a908    AT_location(<5> OP_REG(17))
    60. 0004a911    AT_name(posi_y)
    61. 0004a91a:<46>TAG_local_variable
    62. 0004a920    AT_sibling(0004a948)
    63. 0004a926    AT_fund_type(FT_signed_short)
    64. 0004a92a    AT_location(<11> OP_BASEREG(29) OP_CONST(190) OP_ADD)
    65. 0004a939    AT_name(posi_x_start)
    66. 0004a948:<45>TAG_local_variable
    67. 0004a94e    AT_sibling(0004a975)
    68. 0004a954    AT_fund_type(FT_signed_short)
    69. 0004a958    AT_location(<11> OP_BASEREG(29) OP_CONST(188) OP_ADD)
    70. 0004a967    AT_name(posi_x_step)
    71. 0004a975:<40>TAG_local_variable
    72. 0004a97b    AT_sibling(0004a99d)
    73. 0004a981    AT_fund_type(FT_signed_short)
    74. 0004a985    AT_location(<5> OP_REG(19))
    75. 0004a98e    AT_name(reverse_flag)
    76. 0004a99d:<41>TAG_local_variable
    77. 0004a9a3    AT_sibling(0004a9c6)
    78. 0004a9a9    AT_fund_type(FT_signed_short)
    79. 0004a9ad    AT_location(<11> OP_BASEREG(29) OP_CONST(186) OP_ADD)
    80. 0004a9bc    AT_name(count0x)
    81. 0004a9c6:<43>TAG_local_variable
    82. 0004a9cc    AT_sibling(0004a9f1)
    83. 0004a9d2    AT_user_def_type(00049c25)
    84. 0004a9d8    AT_location(<11> OP_BASEREG(29) OP_CONST(172) OP_ADD)
    85. 0004a9e7    AT_name(count_x)
    86. 0004a9f1:<43>TAG_local_variable
    87. 0004a9f7    AT_sibling(0004aa1c)
    88. 0004a9fd    AT_user_def_type(00049c25)
    89. 0004aa03    AT_location(<11> OP_BASEREG(29) OP_CONST(168) OP_ADD)
    90. 0004aa12    AT_name(count_y)
    91. 0004aa1c:<39>TAG_local_variable
    92. 0004aa22    AT_sibling(0004aa43)
    93. 0004aa28    AT_user_def_type(00049c25)
    94. 0004aa2e    AT_location(<11> OP_BASEREG(29) OP_CONST(164) OP_ADD)
    95. 0004aa3d    AT_name(tmp)
    The local variable is named with AT_name, and AT_location refers to its storage location.
    No more guessing!
     
  10. Devon

    Devon

    I'm a loser, baby, so why don't you kill me? Tech Member
    1,248
    1,419
    93
    your mom
    The dwarf2cpp dump actually includes the local variable names in the functions. I assume you were decompiling actb_init_a?

    Code (Text):
    1. //
    2. // Start address: 0x101f6f0
    3. void actb_init_a(_anon1* pActwk)
    4. {
    5.     _anon1* pActwk_w;
    6.     _anon1* pPlayerwk;
    7.     unsigned char** pTbltbl;
    8.     unsigned char* pTbla;
    9.     char patno;
    10.     char userflag;
    11.     short time_x;
    12.     short time_y;
    13.     short posi_x;
    14.     short posi_y;
    15.     short posi_x_start;
    16.     short posi_x_step;
    17.     short reverse_flag;
    18.     short count0x;
    19.     _anon5 count_x;
    20.     _anon5 count_y;
    21.     _anon5 tmp;
     
  11. BenoitRen

    BenoitRen

    Tech Member
    409
    182
    43
    It does have the local variable names, but when I read the assembly code, I see references to registers, or worse, stack offsets. Now I can link the two when translating to C without guessing.
     
  12. Devon

    Devon

    I'm a loser, baby, so why don't you kill me? Tech Member
    1,248
    1,419
    93
    your mom
    Ah, right, makes sense. That's great to hear!
     
  13. BenoitRen

    BenoitRen

    Tech Member
    409
    182
    43
    How do we know that Sonic Gems Collection was compiled with Metrowerks Codewarrior?

    I'm asking because my installation of Codewarrior doesn't come with everything that's necessary to compile for the PS2 (and the documentation treats the PS2 as an afterthought), which makes me wonder how you get it to work. Presumably you need the SDK, but that has its own compiler, so you don't need Codewarrior.
     
  14. Devon

    Devon

    I'm a loser, baby, so why don't you kill me? Tech Member
    1,248
    1,419
    93
    your mom
    While I can't answer how to get it to work, I can mention these 2 things:

    1. Its linker (which I was able to run by itself) was able to parse the debug info, which isn't able to be parsed by any other tool.
    2. These signatures appear in each ELF file (former showing up repeatedly even), "MW" short for "Metrowerks":
    [​IMG]
    [​IMG]

    The GameCube version has even more stuff:
    [​IMG]
    [​IMG]
    [​IMG]

    Alongside a bunch of stuff for "MetroTRK", a target resident kernel that acts as a debug monitor for the CodeWarrior debugger, that starts with this:
    [​IMG]
     
    Last edited: Aug 25, 2023
    • Like Like x 2
    • Informative Informative x 1
    • List