don't click here

Sonic CD decompilation

Discussion in 'Engineering & Reverse Engineering' started by BenoitRen, Jul 17, 2023.

  1. BenoitRen

    BenoitRen

    Member
    141
    58
    28
    It has been known for years now that the Sonic CD version included as part of Sonic Gems Collection (which is a port of the PC version) comes with debug symbols.

    Last year, Devon dug deeper and produced a disassembly using the linker of the original compiler. Some time later, he found how to extract even more debug information. Now, a skeleton of the original source code repository was available, and he showed a sample decompilation.

    That's when I joined Sonic Retro and started this project, which aims to restore the original C89 source code. This is possible because part of the recovered debug information links line numbers of the original source code to groups of MIPS assembly instructions. Of course, comments can't be recovered, which results in lots of whitespace at times.

    After dedicating almost all of my free time for the past three months to this project, I've hit the first milestone: the decompiled main/root files are now available!

    These files can't be considered finished yet, though. I haven't yet figured out how the global variables are structured across files and in which file they belong. Hopefully, that'll become clearer while decompiling the rest of the files.

    As you can see, the source code is currently hosted at my website, but I'd like to upload it to a Git repository in Europe. I was thinking of NotABug. As for the license, I want to release this into the public domain, so I was thinking of using Unlicense. What are your thoughts on this?

    Also, does this mean I can also start rambling about quirks I found? :)
     
    Last edited: Jul 18, 2023
  2. Billy

    Billy

    RIP Oderus Urungus Member
    2,079
    151
    43
    Colorado, USA
    Indie games
    As for Git hosting, I imagine anything that allows people to clone the repo will be fine, and if you want people to be able to contribute, can handle pull requests and such.

    Licensing I'm far from an expert on. Obviously people can't legally sell Sonic CD, but I'm guessing you just want a public domain license with "software is provided as-is with no warranty, etc." disclaimer, so I imagine that'd be fine. Looks like the Mario 64 decomp project does something similar and uses CC.
     
  3. Devon

    Devon

    Powered by a malfunctioning Motorola 68000 Tech Member
    1,040
    1,053
    93
    your mom
    I dunno anything about licensing or hosting, but I'm actually really happy to see this. Can't wait to see more progress.
    This thread could use some new posts ;)
     
  4. BenoitRen

    BenoitRen

    Member
    141
    58
    28
  5. Brainulator

    Brainulator

    Regular garden-variety member Member
    Personally, I'd like to see if there's a way we can make clear which labels are original to the code and which ones were made up in lieu of better information.

    Was this taken from the PS2 version of Gems Collection or the GCN version?
     
  6. BenoitRen

    BenoitRen

    Member
    141
    58
    28
    I had to name most of the structs and all of the unions, as their names weren't included. Would a wiki page listing those work? Everything else is original.

    The PS2 collection's version is the one that's being decompiled.
     
  7. BenoitRen

    BenoitRen

    Member
    141
    58
    28
    I'm still hard at work decompiling the files related to R11A (the first act in the present), and am almost done.

    At the same time, I've also pushed the root files through a C compiler, fixing all the compilation errors and gaining a better understanding of the global variables. The result of that work is now available.
     
  8. BenoitRen

    BenoitRen

    Member
    141
    58
    28
    The past week I've finished decompiling the files related to R11A, and have been fighting with Metrowerks CodeWarrior, the compiler that was originally used to compile the game. Yes, "fighting", because getting it to work is a pain. The documentation in general is lacking, and the little PS2-specific documentation there is seems like an afterthought.

    After lots of frustration, I was able to compile an ELF file that resembles the original. What's different is that the code section is supposed to be at a much higher address, and something must be going on with the data section because not all global variables are stored in it.

    Despite this, I was able to start comparing assembly. It's interesting how little changes that achieve the same end result affect the output when compiled without optimisations (because, yes, this game was compiled with *all* of them off!).

    For example, if you have an integer you want to test for not having a value of zero, you can do this:
    Code (Text):
    1. if (someNumber != 0)
    However, this generates an extra instruction compared to this:
    Code (Text):
    1. if (someNumber)
    In all cases I've seen thus far, the second notation seems to be what's used.

    There are several differences that I can't explain, however. For example, in some cases when a function argument is assigned to a register, it's assigned twice in the assembly I generated.

    I'll be continuing the comparison for now, as it does make the code more complete and has unearthed a mistake, but I don't think I'll be able to get everything like the original without help.
     
  9. Devon

    Devon

    Powered by a malfunctioning Motorola 68000 Tech Member
    1,040
    1,053
    93
    your mom
    To be fair, this is basically a debug build, considering all the debugging information left in. Leaving the compiled code unoptimized allows for easy step by step debugging as the program is run for the developers. At least the good news with it is that you can more or less get a 1:1 recreation of the source code with that, whereas optimizations would've stripped and rearranged some stuff.
     
  10. Black Squirrel

    Black Squirrel

    bed 'n' breakfast Wiki Sysop
    8,054
    2,025
    93
    Northumberland, UK
    Wiki and Minnie's Runaway Railway
    I don't know if their IDE of choice would let them switch between "debug" and "release" builds like you'd have today, but there are often bugs exclusive to release mode.

    If you're a year out from the actual release, and you know big chunks of the codebase are likely to change, fixing these bugs isn't a priority. Just compile in debug - it's not like anyone's going to break in and analyse the code oh
     
  11. BenoitRen

    BenoitRen

    Member
    141
    58
    28
    I've now created the Missing symbols wiki page for this.
     
  12. Brainulator

    Brainulator

    Regular garden-variety member Member
    Thanks. May I ask why, though, you replaced the symbols for certain structs which did have their names saved?
     
  13. BenoitRen

    BenoitRen

    Member
    141
    58
    28
    I figured that the type name would probably be different from the tag. For example, it's unlikely that tagPOINT would also be the type name. That logic doesn't hold true for dlink_export, though, so I'll change its type name to be the same.

    If anyone has suggestions for better names, I'm all ears. I intend to change act_info to spr_sts_tbl (sprite status table) for the sake of consistency.