Double posting, because I think it's warranted. As I said in the previous post, I found a tool called dwarf2cpp, that parses DWARF v1 data and generates C/C++ skeletons from them, basically allowing for easier analysis of variables, structures, and function prototypes and their local variables, while also setting up the folder structure of the source code. No actual code is decompiled, it just dumps those things. Should be a good resource for a possible decompilation in the future maybe? Download GitHub Repository Some samples of what it generated: Spoiler Here's a sample decompilation I did with this information (note: structure names and constant names had to be made up): Code (Text): void action(void) { actwkt *pActwk; i32 i; pActwk = actwk; for (i = 0; i < ACTWK_SLOTS; ++i) { if (pActwk->actno != 0) { act_tbl[pActwk->actno](pActwk); } ++pActwk; } } void speedset(actwkt *pActwk) { i32u xpos; i32u ypos; i16u spd; ypos = pActwk->yposi; xpos = pActwk->xposi; spd = pActwk->xspeed; xpos.l += (spd.w << 8); spd = pActwk->yspeed; if (!(pActwk->actfree[PLAYCTRL] & 8)) { if (spd.w >= 0 || (!(pActwk->actfree[PLAYCTRL] & 2) || spd.w >= -0x800)) { if (!(pActwk->actfree[PLAYCTRL] & 4)) { pActwk->yspeed.w += 0x38; } } } if (pActwk->yspeed.w >= 0) { if (pActwk->yspeed.w >= 0x1000) { pActwk->yspeed.w = 0x1000; } } ypos.l += spd.w << 8; pActwk->xposi.l = xpos.l; pActwk->yposi.l = ypos.l; } void speedset2(actwkt *pActwk) { i32u xpos; i32u ypos; i32 spd; i32 actwkno; i16 d1; xpos = pActwk->xposi; ypos = pActwk->yposi; spd = pActwk->xspeed.w; if (pActwk->cddat & 8) { actwkno = pActwk->actfree[PLAYRIDE]; if (actwk[actwkno].actno == 0x1E) { d1 = -0x100; if (!(pActwk->cddat & 1)) { d1 = -d1; } spd += d1; } } spd <<= 8; xpos.l += spd; spd = pActwk->yspeed.w; spd <<= 8; ypos.l += spd; pActwk->xposi = xpos; pActwk->yposi = ypos; }
Hi Devon. Could you please explain how you went about creating your sample decompilation? I can't find good references for reading PS2 MIPS assembly. I've only had experience with reading NES 6502 assembly until now. action() is a global function, so it appears in multiple files. So far I've figured out the first instructions do general housekeeping to update the stack and return addresses. The first actual instruction seems to be: Code (Text): lui a0,hi(actwk+257) According to a reference, lui means "To load a constant into the upper half of a word.". So here the high part of actwk is loaded into the high part of a0 (function argument register)? What does +257 mean, here? Can you please get me started? Thanks!
To be honest, I have a very very bare grasp on MIPS assembly. I just used Ghidra to provide a base decompilation of the function, and then attempted to follow the assembly code to make it more accurate to what was actually programmed. The line numbers associated with addresses dumped via dwarf2cpp also helped with grouping instructions.
Not familiar with PS2 MIPS, but in most assembly languages, that would be the offset to move to after loading the constant address actwk. Think of Actwk like an array, with Actwk being the base offset of the array. It's saying the constant to load is the 257th element of the array in Actwk. Looking further at the code, it seems Actwk is a structure containing many values byte packed into chunks. So that is the offset of some smaller value, ie xposi or yposi or something else in Actwk.
I've found the actual action function using the referenced memory address. I'm stuck on the following instructions: Code (Text): lui s0,hi(actwk+256) addiu s0,s0,lo(actwk+13696) I've determined that the struct type that actwk points to is 74 bytes, which is a strange size. +256 would mean the fourth element, member cddat. Then +13696 would mean the start of struct number 185, which doesn't make sense as the array of actwk structs has 128 elements. I must be doing something wrong. No idea how that translates to: Code (Text): pActwk = actwk; EDIT: Looking back at the symbol table, actwk's size is 8704. Divided by 128, that'd mean that each struct is 68 bytes. When I recalculate, I count 66 bytes. There must be two bytes of padding. EDIT2: Yup, there were two padding bytes: Code (Text): struct anon0 { unsigned char actno; // 1 unsigned char actflg; // 1 unsigned short sproffset; // 2 _anon3** patbase; // 4 _anon5 xposi; // 4 _anon5 yposi; // 4 _anon9 xspeed; // 2 _anon9 yspeed; // 2 _anon9 mspeed; // 2 unsigned char sprhsize; // 1 unsigned char sprvsize; // 1 unsigned char sprhs; // 1 unsigned char sprpri; // 1 unsigned char patno; // 1 (padding byte) _anon9 mstno; // 2 unsigned char patcnt; // 1 unsigned char pattim; // 1 unsigned char pattimm; // 1 unsigned char colino; // 1 unsigned char colicnt; // 1 unsigned char cddat; // 1 unsigned char cdsts; // 1 unsigned char r_no0; // 1 unsigned char r_no1; // 1 (padding byte) _anon9 direc; // 2 _anon9 userflag; // 2 unsigned char dummy[2]; // 2 unsigned char actfree[22];// 22 }; Finally, my decompilation of the action function: Code (Text): void action() { _anon0* pActwk = actwk; for (int i = 0; i < 128; ++i) { if (pActwk->actno != 0) { act_tbl[(pActwk->actno - 1) << 2](pActwk); } ++pActwk; } }
I've made lots of progress the past week, reading the disassembly assisted by Ghidra. I've decompiled almost all the code that's meant to go in the action.c file. There's just one function left, but it uses an undefined global variable called @113 and an existing stack variable that's probably set by the calling function. The calling function is not part of action.c, so I'm moving on to other files until I have more data. For some reason Ghidra is bugged regarding global variables. It seems to have multiplied the addresses by 2, which not only makes it point to incorrect addresses, but also makes them fall outside of the memory range. This makes it impossible to name or assign types to them. I have no idea of how to resolve this.
I want to ask: how did you all manage to get the DWARF v1 data properly interpreted by Ghidra? From what I've seen, Ghidra only natively supports DWARF versions 2 and above, and while an extension exists to handle this sort of thing, but to my knowledge, it does not work on newer versions of Ghidra (or at the very least, I cannot recompile it with current tools as described here).
This DWARFv1 debug info is a gift that keeps on giving. Yesterday I was decompiling a function with many local variables. Until now, I've always figured out what name to link to which register based on context. But I was thinking, shouldn't such information be part of the debug symbols? So, I opened up a file and searched for the function name. Right below it, there's this: Code (Text): 0004a75e:<45>TAG_formal_parameter 0004a764 AT_sibling(0004a78b) 0004a76a AT_mod_u_d_type(<5>MOD_pointer_to (000494cc)) 0004a773 AT_location(<11> OP_BASEREG(29) OP_CONST(192) OP_ADD) 0004a782 AT_name(pActwk) 0004a78b:<24>TAG_lexical_block 0004a791 AT_sibling(0004aa47) 0004a797 AT_low_pc(0101f6f0) 0004a79d AT_high_pc(0101faa0) 0004a7a3:<47>TAG_local_variable 0004a7a9 AT_sibling(0004a7d2) 0004a7af AT_mod_u_d_type(<5>MOD_pointer_to (000494cc)) 0004a7b8 AT_location(<11> OP_BASEREG(29) OP_CONST(180) OP_ADD) 0004a7c7 AT_name(pActwk_w) 0004a7d2:<42>TAG_local_variable 0004a7d8 AT_sibling(0004a7fc) 0004a7de AT_mod_u_d_type(<5>MOD_pointer_to (000494cc)) 0004a7e7 AT_location(<5> OP_REG(30)) 0004a7f0 AT_name(pPlayerwk) 0004a7fc:<45>TAG_local_variable 0004a802 AT_sibling(0004a829) 0004a808 AT_mod_fund_type(<4>MOD_pointer_to MOD_pointer_to FT_unsigned_char) 0004a810 AT_location(<11> OP_BASEREG(29) OP_CONST(176) OP_ADD) 0004a81f AT_name(pTbltbl) 0004a829:<36>TAG_local_variable 0004a82f AT_sibling(0004a84d) 0004a835 AT_mod_fund_type(<3>MOD_pointer_to FT_unsigned_char) 0004a83c AT_location(<5> OP_REG(18)) 0004a845 AT_name(pTbla) 0004a84d:<33>TAG_local_variable 0004a853 AT_sibling(0004a86e) 0004a859 AT_fund_type(FT_char) 0004a85d AT_location(<5> OP_REG(23)) 0004a866 AT_name(patno) 0004a86e:<36>TAG_local_variable 0004a874 AT_sibling(0004a892) 0004a87a AT_fund_type(FT_char) 0004a87e AT_location(<5> OP_REG(22)) 0004a887 AT_name(userflag) 0004a892:<34>TAG_local_variable 0004a898 AT_sibling(0004a8b4) 0004a89e AT_fund_type(FT_signed_short) 0004a8a2 AT_location(<5> OP_REG(16)) 0004a8ab AT_name(time_x) 0004a8b4:<34>TAG_local_variable 0004a8ba AT_sibling(0004a8d6) 0004a8c0 AT_fund_type(FT_signed_short) 0004a8c4 AT_location(<5> OP_REG(21)) 0004a8cd AT_name(time_y) 0004a8d6:<34>TAG_local_variable 0004a8dc AT_sibling(0004a8f8) 0004a8e2 AT_fund_type(FT_signed_short) 0004a8e6 AT_location(<5> OP_REG(20)) 0004a8ef AT_name(posi_x) 0004a8f8:<34>TAG_local_variable 0004a8fe AT_sibling(0004a91a) 0004a904 AT_fund_type(FT_signed_short) 0004a908 AT_location(<5> OP_REG(17)) 0004a911 AT_name(posi_y) 0004a91a:<46>TAG_local_variable 0004a920 AT_sibling(0004a948) 0004a926 AT_fund_type(FT_signed_short) 0004a92a AT_location(<11> OP_BASEREG(29) OP_CONST(190) OP_ADD) 0004a939 AT_name(posi_x_start) 0004a948:<45>TAG_local_variable 0004a94e AT_sibling(0004a975) 0004a954 AT_fund_type(FT_signed_short) 0004a958 AT_location(<11> OP_BASEREG(29) OP_CONST(188) OP_ADD) 0004a967 AT_name(posi_x_step) 0004a975:<40>TAG_local_variable 0004a97b AT_sibling(0004a99d) 0004a981 AT_fund_type(FT_signed_short) 0004a985 AT_location(<5> OP_REG(19)) 0004a98e AT_name(reverse_flag) 0004a99d:<41>TAG_local_variable 0004a9a3 AT_sibling(0004a9c6) 0004a9a9 AT_fund_type(FT_signed_short) 0004a9ad AT_location(<11> OP_BASEREG(29) OP_CONST(186) OP_ADD) 0004a9bc AT_name(count0x) 0004a9c6:<43>TAG_local_variable 0004a9cc AT_sibling(0004a9f1) 0004a9d2 AT_user_def_type(00049c25) 0004a9d8 AT_location(<11> OP_BASEREG(29) OP_CONST(172) OP_ADD) 0004a9e7 AT_name(count_x) 0004a9f1:<43>TAG_local_variable 0004a9f7 AT_sibling(0004aa1c) 0004a9fd AT_user_def_type(00049c25) 0004aa03 AT_location(<11> OP_BASEREG(29) OP_CONST(168) OP_ADD) 0004aa12 AT_name(count_y) 0004aa1c:<39>TAG_local_variable 0004aa22 AT_sibling(0004aa43) 0004aa28 AT_user_def_type(00049c25) 0004aa2e AT_location(<11> OP_BASEREG(29) OP_CONST(164) OP_ADD) 0004aa3d AT_name(tmp) The local variable is named with AT_name, and AT_location refers to its storage location. OP_REG refers to a CPU register by number. It looks like the standard numbers are used as per https://github.com/ps2homebrew/PS2HDDTester/blob/master/hddtester/r5900_regs.h BASEREG in combination with OP_CONST refers to a register offset. In this case, that's the stack. No more guessing!
The dwarf2cpp dump actually includes the local variable names in the functions. I assume you were decompiling actb_init_a? Code (Text): // // Start address: 0x101f6f0 void actb_init_a(_anon1* pActwk) { _anon1* pActwk_w; _anon1* pPlayerwk; unsigned char** pTbltbl; unsigned char* pTbla; char patno; char userflag; short time_x; short time_y; short posi_x; short posi_y; short posi_x_start; short posi_x_step; short reverse_flag; short count0x; _anon5 count_x; _anon5 count_y; _anon5 tmp;
It does have the local variable names, but when I read the assembly code, I see references to registers, or worse, stack offsets. Now I can link the two when translating to C without guessing.
How do we know that Sonic Gems Collection was compiled with Metrowerks Codewarrior? I'm asking because my installation of Codewarrior doesn't come with everything that's necessary to compile for the PS2 (and the documentation treats the PS2 as an afterthought), which makes me wonder how you get it to work. Presumably you need the SDK, but that has its own compiler, so you don't need Codewarrior.
While I can't answer how to get it to work, I can mention these 2 things: 1. Its linker (which I was able to run by itself) was able to parse the debug info, which isn't able to be parsed by any other tool. 2. These signatures appear in each ELF file (former showing up repeatedly even), "MW" short for "Metrowerks": The GameCube version has even more stuff: Alongside a bunch of stuff for "MetroTRK", a target resident kernel that acts as a debug monitor for the CodeWarrior debugger, that starts with this: