don't click here

Sonic 2 Split Disassembly

Discussion in 'Engineering & Reverse Engineering' started by FraGag, Oct 6, 2008.

  1. FraGag


    Tech Member
    Xenowhirl's Sonic 2 split disassembly is now up on the SVN server, as Scarred Sun announced. This will allow everyone to improve the disassembly, but keep in mind that the built ROM must be identical to Sonic the Hedgehog 2 REV01. The ROM is included in the repository (named s2rev01.bin), and a batch file (chkbitperfect.bat), which uses this file, is provided to allow you to check if you didn't do anything wrong. Please, do not commit if the built ROM is not identical to Sonic 2 REV01!

    Before committing, please leave a comment about what you changed in changelog.txt. You can also put the message in the message field when committing your changes.

    Another reason I've created this topic was to gather opinions if someone wants to change something, but doesn't know if it will be well received. You don't need to get approbation for every change, but if you're unsure, just ask here. Also, if you want to revert someone's changes, please ask here first, unless what you want to revert is vandalism.

    So, I'll go first! I've changed build.bat to make AS output the errors to a separate file (s2.log) using the -E switch, instead of outputting them in the console. AS will delete the file if there are no errors, so it is possible to test the presence of this file to determine if there were build errors/warnings. Having the errors in a file makes it easy to see all the errors, because the console may cut off the start of the errors if there are too many. It will also make it easier for starters to share their errors. Is this a good idea?

    Another change I'd like to see is having the code split to several .asm files. If you've worked with the modified version of Hivebrain's Sonic 1 disassembly targetting AS, you may have remarked that when the messages appear, they can be lost (because the console's output is cut off) when there are a lot of files processed after that error. I believe outputting the error messages to a separate file will solve this problem and allow us to split the code without too many worries. What do you think?
  2. As I said yesterday on IRC, this is awesome stuff. One question I had in mind was that what naming scheme should we follow for new or renamed names? Hive's 2005 scheme is pretty inconsistent, as is Xeno's, so it can be a bit of a hassle thinking of a new name. Another thing to note is that when changing labels, it's a good idea to note the old label name in comment form above it (as done already in many places by Xenowhirl), so that anyone searching for the old name can figure out the new one easily. (I forgot to do this myself yesterday, I'll fix that now).

    The thing I dislike about splitting it like that is it makes searching much harder - for example if you want to find out where a RAM variable is used, it's far easier to do that when you have one large ASM file rather than multiple split ones. On the other hand, it makes editing much easier, so I'm pretty undecided.

    The version of build.bat at the SVN doesn't have that change in it though :S

    EDIT: Nvm, I modified it. I also added an if condition so that if the first parameter is -pe (I.e. you run it as build -pe or build.bat -pe) the errors are printed out on screen rather than being sent to file. chkbitperfect.bat has also been modified to support this parameter.
  3. Double posting because I noticed this in qiuu's log message:

    I'm fine with all three, except with Chunk_Table instead of ChunkTable for consistency with all the other equates. What do you guys think?
  4. Scarred Sun

    Scarred Sun

    Be who you needed when you were younger Administrator
    Tower 8 ️
    Welp, this.
    I'm not sure how feasible this would be for building purposes, but would it be possible to just put a switch in split.bat to allow for both options?
  5. Sik


    Sik is pronounced as "seek", not as "sick". Tech Member
    being an asshole =P
    It should be possible. I know how to deal with both command line parameters and asking for simple input (e.g. one key stroke) in batch files.
  6. FraGag


    Tech Member
    It was a suggestion :P. That's why I didn't commit it yet. I also added a check to see if the log file exists, I'll (merge and) commit when I'm home.

    That's good for me.

    Another thing I've been wondering about is how the objects are labelled. Currently, objects are identified with their object ID. I think it's easier to remember names rather than numbers, so I suggest we rename e.g. Obj01 to Sonic, Obj25 to Ring, Obj26 to Monitor, etc., and all labels starting with these names. Stealth's Sonic & Knuckles disassembly uses a similar approach because some objects are not in an object list, but that doesn't mean we should not adopt this idea.

    About splitting the code, I agree that it makes searching more difficult. At the very least, I think that we could put the named constants in a separate file, so when one opens s2.asm, he sees the start of the code almost right away. Another advantage I see is that, when giving a name to a RAM variable, I usually add the constant, then want to do a global search & replace in the whole document, but that would also rename the value in the constant declaration itself (giving something like X = ramaddr( X )), and by having the constants in a separate file, searching & replacing wouldn't replace the value of the constant with the name of the constant. Shall we do this?
  7. I agree, though I'd prefer ObjSonic or Obj_Sonic over just Sonic.

    I've encountered that a shitload of times, so definitely yes.
  8. FraGag


    Tech Member
    Done. The constants are now in s2.constants.asm.
    Stealth used the latter format in his Sonic & Knuckles disassembly. I think it's a good compromise, but some objects may be hard to describe and end with very long names. However, I'd like to hear more opinions on this before we start renaming.

    Speaking of long names, do we want to keep the 15 characters limit on the labels? I think it doesn't make much sense to keep this limit since most of the code is indented with 1 tabulation. I restricted myself to 23 characters when labelling the object pointers in Obj_Index to avoid making the lines too long, but if nobody objects, we could give longer names instead of sometimes cryptic abbreviations that are not very used. For example, I think we should put "Horizontal" instead of "Horiz" and "Vertical" instead of "Vert," but we can keep common abbreviations like "Pal," Obj" and "Ptr."
  9. Sik


    Sik is pronounced as "seek", not as "sick". Tech Member
    being an asshole =P
    I guess the 15 character limit is so to make sure it's compatible with old assemblers (yet, if that was the case, we should restrict it to 8 characters instead). But none of snasm68k, asm68k and asmx have such a limit, so it's really pointless. I'd go with a 31 character limit. Object names don't need to be so descriptive either, abbreviations can be tolerated.

    Also, I use H and V rather than Horiz and Vert, for the win =P
  10. Qjimbo


    Your friendly neighbourhood lemming. Oldbie
    Will versions of this in non-SVN form be available periodically on the Disassemblies page or will users be required to install an SVN client to get a recent version?
  11. Puto


    Shin'ichi Kudō, detective. Tech Member
    Portugal, Oeiras
    Part of Team Megamix, but haven't done any actual work in ages.
    Personally I think that problem would be non-existant if the option to download a tarball of the entire repository was turned on in the WebSVN config...
  12. I had no idea we even had a limit. About abbreviations, personally I think Horiz and Verti/Vert are not that hard to understand, but I don't really mind either way.
  13. FraGag


    Tech Member
    My main concern about label names is that if we try to give every label a significant name, we'll have to use quite long names in some places.

    shobiz, the 15 characters limit comes from IDA, who suggests this by default, although it can be turned off. 31 characters is already better, but I don't like the idea of having a "conventional" limit if we can technically use longer labels. Obviously, giving very long names to labels that are use very frequently is not desirable. Maybe we could use more temporary symbols? Nameless temporary symbols are already used, but maybe we could try using named temporary symbols in large blocks of code like objects, where some labels are only referenced within that object, but using nameless temporary symbols wouldn't work. Also, when replacing labels like "loc_ABC" to temporary symbols, should we really keep the old label? I know some guides reference a few of them, so it may be a bit difficult to determine if some label should be kept or not.
  14. Sik


    Sik is pronounced as "seek", not as "sick". Tech Member
    being an asshole =P
    The problem about removing character limits is that assemblers may not like it. I said 31 because it seems coherent to me; in C, the standard defines that compilers should at least remember the first 31 characters of an identifier name, and I don't think this situation may be so different in assemblers. So I wouldn't go over that limit or we run the risk of it not working on assemblers for real.
  15. ICEknight


    Researcher Researcher
    Or obj01_Sonic, so you don't have to look for its object number somewhere else?

    Also, having the object number in there will allow to put certain objects as obj45_unk, obj46_unk, etc, when they're unknown, without reusing the same name.
  16. Oh, you meant that, I thought it was either a guideline or a technical restriction we were following. Though yeah, it's possible to have much longer names in IDA, I even have a 50 character long name (Obj_LightningNullifiedInvisibleHurtBlockHorizontal).

    It's a bit of a slippery slope, but IMO it has the potential to clutter up the code quite a bit, so I feel we shouldn't.

    I really like the idea, but I believe you'd have to use a code section to achieve that effect, since, to quote the AS manual,
    EDIT: On a different note, is it really necessary to have the original hex form of the command as a comment after almost each instance of the vdpComm function? I believe it was qiuu who added them, but I think they're pretty pointless since the point of that function is to save people the trouble of using that syntax.
  17. qiuu


    Tech Member
    Blue Ball & Blocks
    I had added the vdpComm replacements to an older version of the disassembly the evening before they were uploaded by someone else (not sure by whom), but was without internet back then. I had done these changes manually so I left the original values commented there just in case I screwed something up.
    As both my older non-uploaded disassembly as well as the new one online the next day had made some changes the other hadn't, I sort of merged them together leaving the comments from my version.
    So yeah, I can simply remove them again.

    As for the object labels, I'm pretty indifferent on this. Advantage of leaving the number perhaps is that Obj##_ can easily be added in front of object specific subroutines, while adding the object name would make these somewhat lengthy.
    I also think some of my label names are longer than 15 characters, my use of abbreviations is also somewhat inconsistent as well, (like LevelSelect vs. LevSel, or abbreviating Horizontal with Hztl and Vertical with Vrtc). I guess it would be good for consistency's sake to decide on which abbreviations to use.
  18. Ah, I see. I'll mass-replace them by regular expression then.

    Update: Done.

    Agree completely, it would make coming up with new names a lot easier as well.
  19. Sik


    Sik is pronounced as "seek", not as "sick". Tech Member
    being an asshole =P
    I'd use the object name for known objects and "Obj##" or "ObjUnknown##" for unknown ones, and leave the usage of both only when there's a name conflict =P
  20. FraGag


    Tech Member
    Why should we care about object IDs? The original IDs are still in comments in the dynamic ID declarations in s2.constants.asm. The point of the dynamic IDs is that you can completely remove several useless objects in a hack and still have the game reference the correct objects (the object layouts would have to be changed, but I couldn't make them use the dynamic IDs and make them usable in SonED2 at the same time).

    I pretty much identified every object; the entries in Obj_Index without a name are most probably not used at all in the game; I also inspected the object layouts using a small program I made, but I may have missed some. Anyway, it's probably not very important to work on those unused objects. (The only objects marked as unknown are $AB and $BB.)

    About the routine names and such, I thought we could make a function/macro to prepend the name of an object (stored in a global variable) when declaring a label inside an object's code.

    If we have a list of "standard" abbreviations somewhere, I'm fine with that. :words: