don't click here

Sonic Adventure 2 Kart DLC reverse engineering

Discussion in 'Engineering & Reverse Engineering' started by flarn2006, Aug 16, 2010.

  1. flarn2006

    flarn2006

    Member
    280
    3
    18
    You know how you can download Kart tracks from Sega in Sonic Adventure 2? Is there any information on how to make custom tracks? I know the Kart tracks aren't already stored on the game disc, as the VMU files have different sizes (if they just enabled what was already there, odds are they would have the same size) and I remember Sega doing some contest or something where you can submit track designs and the winners would be made into playable tracks. I even remember reading the person who sent in the Eggrobo track design also sent in a clay model of the Eggrobo car.

    EDIT: (And a very late edit, 1/13/20.) I've changed the name of the thread, which was previously called "Sonic Adventure 2 Kart track editor?".
    If a moderator sees this, feel free to move it to "Engineering & Reverse Engineering"; I don't think I'm able to do it myself.
     
    Last edited: Jan 14, 2020
  2. MainMemory

    MainMemory

    Kate the Wolf Tech Member
    4,742
    338
    63
    SonLVL
    I would imagine that they use the same format as levels (a collection of models and other information for solidity etc), so you would need a whole level editor for that.
     
  3. Sappharad

    Sappharad

    Oldbie
    1,414
    70
    28
    I'm reasonably convinced that they don't.
    It has been several years since I've looked at the files, but I recall that they appeared to be much too small to contain anything more than a layout for pre-existing level chunks and small models for single objects such as the downloadable Opa Opa kart.

    They're actually fairly easy to mess around with if anyone plans to experiment with them. Unlike all of the SA1 downloads, the SA2 Kart race downloads are not encrypted and the only signature on them is the standard VMS checksum. You can easily change the title and description of a track download with nothing more than a hex editor and VMS checksum fixer.

    On another note, I've probably mentioned this before but the menu to load downloaded kart race files was never removed from the GCN port of the game. I never bothered to determine if it actually tries reading from the memory card since DC files would never be compatible anyway due to the endian difference between the GCN and the DC, but it may be worth looking into if anyone ever documents how they work on the DC.
     
  4. flarn2006

    flarn2006

    Member
    280
    3
    18
    Why would the SA1 downloads be encrypted? And what was there besides just files that unlock stuff that's already there? (the Christmas thing for instance.) Wasn't there something that put up ads for some cell phone company all over Speed Highway? Was that already in the game? (If not, it would be cool if you could reverse engineer the format it used -- maybe you could make your own objects and place them in levels!)

    Oh, and it would be really cool if you could somehow enable the download extra menu on the GC using an Action Replay code, then use a Gameshark GameSaves card to install the tracks. Was that Download Extra menu ever used for anything other than Kart tracks?

    EDIT: It was AT&T. I forgot it actually had a whole time attack challenge. It said the top scorers would win a prize...what's to stop them from using a GameShark to hack it? And the extra time attack HUD added (in the lower-right corner) looks like the same font as the debug text in Sonic Adventure DX: Preview, btw. (YouTube)
     
  5. Diablohead

    Diablohead

    Indie dev Oldbie
    1,898
    87
    28
    Near London
    games
    I thought all 8 or so tracks were already in the game and the download file just unlocks it? first-gen DLC style.

    Same for voice samples in sonic adventure 1 and 2, already on the disc.
     
  6. Sappharad

    Sappharad

    Oldbie
    1,414
    70
    28
    Voice samples, yes for both. In SA1 there were 6 or so Twinkle Cart courses on the disc.

    But no evidence has been discovered to the contrary for SA2. It seems likely and feasible to me that the courses were actually part of the downloads. I don't think we're going to get an answer either way though unless someone feels like putting a few hours into investigating. If you wait a few years, I might get some time to look.
     
  7. flarn2006

    flarn2006

    Member
    280
    3
    18
    It's more than just likely. How else could they have done that contest after the game was already released?
     
  8. Diablohead

    Diablohead

    Indie dev Oldbie
    1,898
    87
    28
    Near London
    games
    I forgot all about that contest, more or less confirms that the dlc itself is the layout, then.

    Someone get to work :v:
     
  9. Sappharad

    Sappharad

    Oldbie
    1,414
    70
    28
    Okay, so I started documenting the format of the Kart Race data.

    After all of the header information, the actual data portion of the file is PRS compressed. Once you decompress that, you will find a container that contains several additional files. Among those is a PVM file with several PVR textures. It also appears to contain model data. This confirms what we already knew, that the models are located inside of the downloads. In addition, there is a SET file in there which follows the standard format for the object layouts for the level. I'm not sure what the actual course data might look like, but I believe it might be the file right after the SET file. It contains a bunch of 16-bit values, most with the structure 0x00?? but some with 0xFF?? scattered in there. There's no obvious pattern but some of the 0x00?? values repeat, which makes me think it could be a layout stored in a grid, similar to how the 2D levels were stored. I could be completely wrong, but this is just taking a stab in the dark until it can be further investigated.

    I'm done looking for now, but with this information you should have enough details to decompress one of the downloads, try to modify some stuff, then re-compress the PRS and see what you did. I don't have time to go any further with this for now, so I really hope someone else does something with this before I get a chance to continue.

    Here are my notes so far:
    Code (Text):
    1. Sonic Adventure 2 Kart Race VMS file format
    2. Note: All multi-byte numbers are little endian unless otherwise noted.
    3.  
    4. The usual VMS header information begins at 0000.
    5. 0x0000+0x00F: Download title. End padded with 00's.
    6. 0x0010+0x200: Always "SONIC ADVENTURE 2 / Download", end padded with 00's.
    7. 0x0030+0x100: Application that generated file. Always 00's in official dl.
    8. 0x0040+0x002: Number of file icons (Always 01. Can be larger for animation)
    9. 0x0042+0x002: Icon animation speed. (Always 01. Not animated)
    10. 0x0046+0x002: VMS CRC16 checksum (Unused, set to 00)
    11. 0x0048+0x002: Size of data following the icon
    12. 0x0060+0x020: Icon color palette
    13. 0x0080+0x200: Icon frame
    14.  
    15. Actual file data begins at 0x0280. Remember, Little Endian!
    16. 0x0280+0x004: Always 0x04
    17. 0x0284+0x014: A set of 32 bit memory addresses, in increasing order. Each subsequent address increases by 0x14.
    18. These memory addresses correspond to System RAM in Privileged mode with caching enabled.
    19. These are never the same between the 3 released tracks, which implies that all of them may get loaded into memory
    20. at the same time.
    21. 0x0298+0x004: Always 0x46.
    22. 0x029C+0x02C: Filled with 0xFF.
    23. 0x02B8+0x004: A 32-bit memory address.
    24. 0-02BC+0x???: What appears to be a series of 16-bit values with no predefined size. I assume the high byte of them
    25. corresponds to a command, while the low byte may be the value. 0x0000 appears to be a null because in a later
    26. download it's omitted. 0x83 appears to be the most common command, although there are several other 0x8_ commands.
    27. The 0x6A command followed by 0x81 appears to denote the end of these commands, which is followed by 00.
    28.  
    29. After these commands, the Kart text begins. Kart text is formatted as follows.  All strings are padded with 00's at the end,
    30. and the next string begins at the next 32-bit boundary after that point. Thus the locations and sizes are not static.
    31. =============
    32. Course Title (Prefixed with a Tab character, 0x09)
    33. Kart Name
    34. Course name
    35. Course description
    36. =============
    37. This is repeated 4 times, each for English, French, Spanish, German.
    38.  
    39. After this point, the data section of the file begins. This file makes up the remainder of the VMS file. It is PRS compressed data.
    40. Here is what's known of the compressed data:
    41.  
    42. 0x0000+0x004: 32-bit value. Not sure yet?
    43. 0x0004+0x004: 32-bit value. Location of the SET file.
    44. 0x0008+0x004: 32-bit value. Size of SET file.
    45. 0x000C+0x004: 32-bit value. Not sure yet….
     
  10. flarn2006

    flarn2006

    Member
    280
    3
    18
    Is this the same on the Gamecube version? (except for the endianness) I don't have a Nexus memory card, but I do have a GameShark USB memory card for the Gamecube.
    Oh, and thanks for starting to decode the format! :thumbsup: +1 Internet
     
  11. MainMemory

    MainMemory

    Kate the Wolf Tech Member
    4,742
    338
    63
    SonLVL
    The GameCube version didn't have VMS files as far as I'm aware.
     
  12. Sappharad

    Sappharad

    Oldbie
    1,414
    70
    28
    The only thing I know about the Gamecube version right now is that it would be looking at the memory card for files with names following the structure SONIC2B__D##. (I believe I found references to that in the string table) Since the internet features were cut from the GCN version, there's no sample data to look at to see how those files would be structured. I also have no idea if the game actually has logic still there to load them. Given that there are references to actual memory addresses in the DC version, converting the current downloads to work on GCN would not be as simple as flipping the endian-ness everywhere. While I think it would be really neat to try and restore download tracks for the GCN version, I don't think it's going to be realistically feasible for a long time. Either way you'd need an Action Replay and some method of getting files onto your memory card, making anything for the GCN version relatively inaccessible to the majority of people.

    Back on topic, I have to wonder if the container inside of the decompressed PRS is an existing format that we already know. It's not AFS, but I would have to guess that they wouldn't put effort into making a new container format just for the VMU downloads. When I realized the data was compressed, PRS was the first thing I tried but I'm not too familiar with some of their container formats other than AFS.
     
  13. flarn2006

    flarn2006

    Member
    280
    3
    18
    Ah yes; forgot about that...
    I do have both of those, by the way, in case you're wondering.
     
  14. flarn2006

    flarn2006

    Member
    280
    3
    18
    Alright, this is an old thread but I've finally gotten around to taking a deeper look at this, and I thought I'd share my findings.

    TL;DR: I figured out what all six of those 32-bit addresses are for. I also figured out that series of 16-bit values sonicblur mentioned. I'll explain it all here; scroll to the bottom of this post for a summary.

    Going by sonicblur's observation of certain values representing RAM addresses, it appears that the entire VMS data (everything but the VMS header) is loaded into memory at address 8CB00000 during processing. While I haven't looked at the RAM myself to confirm this, what I can confirm is that the purpose of these addresses is to refer to data within the VMS.

    Using the Eggrobo track as an example, the memory addresses, in order, are:

    (0x280 + 0x4 = ) 0x284: 8CB06A74
    (0x280 + 0x8 = ) 0x288: 8CB06A88
    (0x280 + 0xC = ) 0x28C: 8CB06A9C
    (0x280 + 0x10 = ) 0x290: 8CB06AB0
    (0x280 + 0x14 = ) 0x294: 8CB06AC4
    ...
    (0x280 + 0x38 = ) 0x2B8: 8CB00244

    (
    Note: The '+' notation here refers to addition, not field length.)

    Notice I've underlined the lower half of each address. This is because these can be treated as offsets into the same data file we're looking at, past the 0x280-byte header. Let's look at the last one (8CB00244) first: 0x280 + 0x244 = 0x4C4. What's at 0x4C4 in this VMS file? Well, that's where the PRS-compressed portion begins. The Fantasy Zone one has address 8CB00294 in this position, and sure enough, its PRS blob starts at (0x280 + 0x294 = ) 0x514 in the VMS file. So it's a safe bet that the four-byte field at 0x2B8 holds a pointer to the start of the PRS data.

    Before I get into the other five, I'd like to point out one error in sonicblur's observation: the PRS data does not, in fact, fill the remainder of the file. There is a small amount of additional uncompressed data at the end, and this is where those first five addresses point. (As for why those addresses differ in every released track, it's simply because the data before them varies in size.)

    Now let's take a look at this uncompressed data at the end. As sonicblur noticed, each of the pointers into it are 20 (decimal) bytes apart. There's five addresses, and this blob of data (minus what appears to be some 00 padding at the end) is exactly 100 (decimal) bytes long. So it looks like it's actually five separate blobs of data, each one 20 bytes long. As for the contents of each of these blobs of data, each one consists of five 32-bit integers, once again in the 8CB0xxxx range. Lists of pointers, presumably. Here's all of them from the Eggrobo DLC:

    8CB06A74: 8CB0003C, 8CB00050, 8CB0005C, 8CB00050, 8CB0006C
    8CB06A88: 8CB000BC, 8CB000C8, 8CB000D4, 8CB000C8, 8CB000E4
    8CB06A9C: 8CB0012C, 8CB00138, 8CB00144, 8CB00138, 8CB00154
    8CB06AB0: 8CB0012C, 8CB00138, 8CB00144, 8CB00138, 8CB001A0
    8CB06AC4: 8CB0012C, 8CB00138, 8CB00144, 8CB00138, 8CB001F4

    Okay, lots of duplicate addresses in here; most notably the last three lists differ only in the last address. As it turns out, it's no mystery what these pointers are for: they point to the strings earlier in the file. So, those first five addresses point to string tables.
    1. The label that appears in the Download Event menu. (Preceded by a tab character, interestingly—I tried removing it and the text appeared on the very left of the screen instead of the center.)
    2. The text that appears after "Type:".
    3. The text that appears after "Stage:".
    4. The text that appears after "Character:".
    5. The description text.
    Except, wait, hold the phone! Those first five addresses don't look like any string data I've seen—that's the section of data with those strange 16-bit values sonicblur found. Luckily, there's enough clues here to figure out what that data is: there's text in four different languages (English, French, Spanish, and German), but aren't there five language choices in-game? The other one is Japanese. Let's take a look at some of this data:

    09 83 47 83 62 83 4F 83 8D 83 7B 93 6F 8F EA 81 49 00 00 00​

    Before UTF-8 took over everything, Japanese text was often encoded in Shift-JIS. (I wouldn't be surprised if it was still common today though.) If we interpret that data as Shift-JIS-encoded text, we get: " エッグロボ登場!" (The fact that it ends with an exclamation point looks promising.) Put this into Google Translate, and it says "Egg Robo Appears!"

    Mystery solved.

    So, to recap, here's what all six addresses point to:

    (0x280 + 0x4 = ) 0x284: Japanese string table. (Shift-JIS encoded)
    (0x280 + 0x8 = ) 0x288: English string table.
    (0x280 + 0xC = ) 0x28C: French string table.
    (0x280 + 0x10 = ) 0x290: Spanish string table.
    (0x280 + 0x14 = ) 0x294: German string table.
    ...
    (0x280 + 0x38 = ) 0x2B8: PRS-compressed stage data.
     
    Last edited: Jan 14, 2020
    • Like Like x 2
    • Informative Informative x 2
    • List
  15. MainMemory

    MainMemory

    Kate the Wolf Tech Member
    4,742
    338
    63
    SonLVL
    You know, there's someone in the X-Hax Discord who I think has entirely cracked the format and is working on restoring it to the PC port, which didn't even exist at the time this thread started!
    I can say for certain that my post earlier in this thread was entirely wrong, the tracks are not set up like normal levels, they're made of a sequence of premade parts. The SA2 Randomizer mod has an option to generate a randomized layout for all five kart tracks.
     
  16. flarn2006

    flarn2006

    Member
    280
    3
    18
    Wait, really? :O
    I'll be sure to check that out! Thanks!
    Shouldn't the information be posted on Sonic Retro somewhere though? Having it in a Discord server is all right but it's inconvenient to link to, and isn't indexed by Google, hence why I was unaware.