don't click here

Making a C++23 Toolchain for the Mega Drive

Discussion in 'Technical Discussion' started by Clownacy, Jun 23, 2024.

  1. Clownacy


    Tech Member
    [Cross-post from Clownacy's Corner]

    One of the great things about C++ is its zero-overhead abstractions, which enable the creation of code which is every bit as performant as C, or even assembly, while being far more concise. Since 2012, I have been programming in assembly for the Mega Drive, whose CPU is a Motorola 68000. This CPU is supported by both GCC and Clang/LLVM, which had me wondering how feasible it is to create homebrew using C++ by leveraging one of these two compilers.

    Of course, there already exist toolchains for using C++ to create Mega Drive homebrew, such as the venerable SGDK, but I wanted to start from scratch, going through the bootstrapping process as if the Mega Drive were a new embedded platform. For this, I would need to learn how to produce a cross-compiler.

    Creating the Cross-Compiler
    My research quickly led me to this invaluable article on the OSDev wiki. It detailed that I would need to download the source code of GNU Binutils and GCC, then configure, compile, and install them both in typical autotools fashion. Despite the very Unix-centric build system, I was able to build both of these on Windows, using MSYS2. I opted to use the path 'C:/msys2/opt/clownsdk' as the installation location, similarly to devkitPro.

    Binutils did not require any special configuration, however GCC did. These are the options that I used:

    Code (Text):
    1. --target=m68k-elf --prefix=/opt/clownsdk --disable-nls --enable-languages=c,c++ --without-headers --disable-multilib --with-cpu=68000

    These options specify that the compiler should only target the Motorola 68000: '--target=m68k-elf' selects the m68k family of CPUs, '--with-cpu=68000' makes the compiler target the 68000 by default, and '--disable-multilib' disables support for other CPUs in the m68k family such as the 68020 and 68040. Because the Mega Drive is a bare-metal embedded platform, the '--without-headers' flag is used to alert the compiler that there is no C standard library available.

    With these settings, I was able to build a working Motorola 68000 compiler. By passing it the '-S' command line flag, I could convert C/C++ to Motorola 68000 assembly, enabling me to examine how compiler flags would influence the generation of assembly code:

    Affecting Assembly Code Generation
    By default, GCC treats 'int' as 32-bit, which does not suit the 68000 well as 32-bit operations are slower than 16-bit and 8-bit, and it has no 32-bit multiplication and division instructions. As a result of this, multiplications and divisions are achieved with calls to helper functions instead, which are incredibly slow. By passing GCC the '-mshort' flag, it can be made to treat 'int' as 16-bit instead, which results in the generation of far more natural assembly.

    When compiling code, it is important to pass the '-ffreestanding' and '-nostdlib' flags to GCC, as these prevent the compiler from trying to use the non-existent C standard library. By doing this, it is possible to compile C/C++ to object files ('.o') and even shared object files ('.so'). Compiling to executable files ('.elf') mostly works, though the linker will complain about the lack of an entry point.

    Only a small part of the C standard library is available, such as the 'stdbool.h' and 'stdint.h' headers. Because of this, it is not possible to use things like the 'strlen', 'assert', and 'qsort' functions. Likewise, there is no C++ standard library whatsoever. This is very much a 'naked' version of C and C++, where you are forced to make do with only the features of the language itself.

    It was with this build environment that I was able to start writing some Mega Drive library code in C++. In particular, I created a partial port of my modified SMPS sound driver - the Sonic 2 Clone Driver v2. By making the compiler produce position-independent code with no global state, I could make it generate a binary which I could include directly into a Sonic ROM-hack and use as a kind of binary blob.

    Normally, the generated object files are not truly position-independent as they require relocation at runtime. They require relocation because GCC does not use Program Counter-relative addressing by default, however it can be made to do so by passing it the '-mpcrel' flag. By doing this, truly position-independent code is produced, which can be used as-is with no relocation.

    Being an embedded platform, it is necessary to read from and write to various memory addresses, such as to access the console's YM2612 sound chip. For this, volatile pointers are necessary:

    Code (Text):
    1. static volatile unsigned char &YM2612_A0 = *reinterpret_cast<volatile unsigned char*>(0xA04000);
    2. static volatile unsigned char &YM2612_D0 = *reinterpret_cast<volatile unsigned char*>(0xA04001);
    3. static volatile unsigned char &YM2612_A1 = *reinterpret_cast<volatile unsigned char*>(0xA04002);
    4. static volatile unsigned char &YM2612_D1 = *reinterpret_cast<volatile unsigned char*>(0xA04003);

    Unfortunately, GCC's handling of volatile pointers is quite clumsy, causing it to frequently needlessly reload the address into a register. To avoid this, accessing raw memory addresses can instead be handled by inline assembly:

    Code (Text):
    1. void WriteFMI(const unsigned char port, const unsigned char value)
    2. {
    3.    asm volatile(
    4.        "0:\n"
    5.        "    tst.b    (%0)\n"     // 8(2/0)
    6.        "    bmi.s    0b\n"       // 10(2/0) | 8(1/0)
    7.        "    move.b    %1,(%0)\n"  // 8(1/1)
    8.        "    move.b    %2,1(%0)\n" // 12(2/1)
    9.        "    nop\n"              // 4(1/0)
    10.        "    nop\n"              // 4(1/0)
    11.        "    nop\n"              // 4(1/0)
    12.        :
    13.        : "a" (YM2612), "idQUm" (port), "idQUm" (value)
    14.        : "cc"
    15.    );
    16. }

    The syntax for this is fairly complex, but also surprisingly flexible and powerful: with clever usage of the input and output operands, it is possible to allow the compiler to inline literal inputs and to load non-literal inputs into registers and reuse them between instances of the inline assembly.

    The 68000 has a terrible calling convention, where every argument is passed as a 32-bit value on the stack. Because of this, it is desirable to minimise function calls. This can be achieved by marking functions as 'static' wherever possible, allowing the compiler to inline them. For class methods and functions with external visibility, the process is a bit more complicated: link-time optimisation would suffice, but globally-visible functions and methods are considered to be 'exported' by default, meaning that they are made part of the library's API, which prevents the compiler from being able to inline them. To resolve this, the default visibility must be changed also. Enabling link-time optimisation and changing the default visibility are done by passing the '-flto' and '-fvisibility=hidden' flags, respectively. With this, slow, stack-hungry function calls will be avoided as much as possible.

    C++ Utilities
    While I did not find the lack of a C standard library to be a problem, the lack of C++ niceties such as 'std::array', 'std::min', 'std::max', and 'std::optional' was, as it meant that all of the zero-overhead abstractions that I liked so much were gone. However, a post on OSDev's forum tipped me off that it is possible to add a portion of the C++ standard library to the toolchain: GCC must be reconfigured with the '--disable-hosted-libstdcxx' flag, and then 'make all-target-libstdc++-v3 install-strip-target-libstdc++-v3' can be ran to produce and install a "free-standing" C++ standard library. Now, all of the aforementioned C++ utilities were available for use!

    By doing all of this, I was now able to write appealing C++ code that compiled to efficient 68000 assembly code. However, I soon became interested in creating more than just position-independent libraries: I wanted to be able to make executables.

    Making Executables
    I knew from the 68 Katy's Linux kernel port that bare-metal C/C++ software needs to be linked with a small bootstrapping program. This program is typically written in assembly and is responsible for initialising the hardware and copying the contents of the executable's '.data' section to RAM before finally executing the C/C++ software. This is not too dissimilar to the usual start-up process of Mega Drive games, so I was able to create such a bootstrapping program without much trouble: the program starts with a standard 68000 vector table, followed by the standard Sega boot-code for initialising the hardware ('ICD_BLK4.PRG'), followed by an instruction that jumps to the C++ code's EntryPoint function:

    Code (Text):
    1.    dc.l    0x00000000,.Lentry,BusErrorHandler,AddressErrorHandler
    2.    dc.l    IllegalInstructionHandler,DivisionByZeroHandler,CHKHandler,TRAPVHandler
    3.    dc.l    PrivilegeViolationHandler,TraceHandler,UnimplementedInstructionLineAHandler,UnimplementedInstructionLineFHandler
    4.    dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UninitialisedInterruptHandler
    5.    dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
    6.    dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
    7.    dc.l    SpuriousInterruptHandler,Level1InterruptHandler,Level2InterruptHandler,Level3InterruptHandler
    8.    dc.l    Level4InterruptHandler,Level5InterruptHandler,Level6InterruptHandler,Level7InterruptHandler
    9.    dc.l    TRAP0Handler,TRAP1Handler,TRAP2Handler,TRAP3Handler
    10.    dc.l    TRAP4Handler,TRAP5Handler,TRAP6Handler,TRAP7Handler
    11.    dc.l    TRAP8Handler,TRAP9Handler,TRAP10Handler,TRAP11Handler
    12.    dc.l    TRAP12Handler,TRAP13Handler,TRAP14Handler,TRAP15Handler
    13.    dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
    14.    dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
    15.    dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
    16.    dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
    17. .Lentry:
    18.    .incbin "ICD_BLK4.BIN"
    20.    | Load DATA section.
    21.    lea    (_DATA_ROM_START_).l,%a0
    22.    lea    (_DATA_RAM_START_).l,%a1
    23.    move.l    #_DATA_SIZE_,%d0
    24.    move.w    %d0,%d1
    25.    lsr.l    #4,%d0
    26.    andi.w    #0xC,%d1
    27.    eori.w    #0xC,%d1
    28.    lsr.w    #1,%d1
    29.    jmp    .Lloop(%pc,%d1.w)
    30. .Lloop:
    31.    move.l    (%a0)+,(%a1)+
    32.    move.l    (%a0)+,(%a1)+
    33.    move.l    (%a0)+,(%a1)+
    34.    move.l    (%a0)+,(%a1)+
    35.    dbf    %d0,.Lloop
    37.    | Clear BSS section.
    38.    moveq    #0,%d2
    39.    lea    (_BSS_START_).l,%a1
    40.    move.l    #_BSS_SIZE_,%d0
    41.    move.w    %d0,%d1
    42.    lsr.l    #4,%d0
    43.    andi.w    #0xC,%d1
    44.    eori.w    #0xC,%d1
    45.    lsr.w    #1,%d1
    46.    jmp    .Lloop2(%pc,%d1.w)
    47. .Lloop2:
    48.    move.l    %d2,(%a1)+
    49.    move.l    %d2,(%a1)+
    50.    move.l    %d2,(%a1)+
    51.    move.l    %d2,(%a1)+
    52.    dbf    %d0,.Lloop2
    54.    | Jump into the user-code.
    55.    jmp    (EntryPoint).l

    For the bootstrapping program to be properly linked to the executable, a linker script would be needed. With a linker script, I could obtain the position and length of the '.data' section, ensure that the bootstrapping program be located at the start of the executable, specify the memory layout of the Mega Drive (so that code is expected to be located in ROM at 0x000000 and variables are expected to be located in RAM at 0xFFFF0000), and make the linker output a raw flat binary instead of an ELF executable file:

    Code (Text):
    1. STARTUP(bin/init.o)
    2. OUTPUT_FORMAT(binary)
    4. MEMORY
    5. {
    6.    ROM (rx) : ORIGIN = 0x00000000, LENGTH = 4M
    7.    RAM (wx) : ORIGIN = 0xFFFF0000, LENGTH = 64K
    8. }
    10. SECTIONS
    11. {
    12.    .rom() : {
    13.        *(.text)
    14.        *(.text.*)
    15.        *(.rodata)
    16.        *(.rodata.*)
    17.        . = ALIGN(4);
    18.        *(.ctors)
    19.        . = ALIGN(4);
    20.        *(.init)
    21.        . = ALIGN(4);
    22.        *(.eh_frame)
    23.        *(.tm_clone_table)
    24.    } > ROM
    26.    .ram : {
    27.        . = ALIGN(2);
    28. _DATA_ROM_START_ = LOADADDR(.ram);
    29. _DATA_RAM_START_ = .;
    30.        *(.data)
    31.        *(.data.*)
    32.        . = ALIGN(4);
    33. _DATA_RAM_END_ = .;
    35.        . = ALIGN(2);
    36. _BSS_START_ = .;
    37.        *(.bss)
    38.        *(.bss.*)
    39.        . = ALIGN(4);
    40. _BSS_END_ = .;
    42.    } > RAM AT> ROM
    44.    /DISCARD/ : {
    45.        *(.dtors)
    46.        *(.fini)
    47.        *(.comment)
    48.        *(.debug_str)
    49.        *(.debug_line)
    50.        *(.debug_line_str)
    51.        *(.debug_info)
    52.        *(.debug_abbrev)
    53.        *(.debug_aranges)
    54.    }
    55. }

    The bootstrapping program expects various symbols to be provided by the C++ code, such as 'EntryPoint', 'BusErrorHandler' and 'Level1InterruptHandler'. All except for EntryPoint are interrupt handlers, and respond to things like the vertical-blanking interrupt and various types of crashes and exceptions. These interrupt handlers need to be declared in a special way so that the compiler knows to end them with the 'rte' instruction instead of the usual 'rts' instruction. This is done by declaring the functions with '__attribute__ ((interrupt))'. All of these functions need to be declared with 'extern "C"' so that the bootstrapping program can link to them properly.

    With all that done, I was able to produce a standard Mega Drive ROM file, and I was pleasantly surprised to see that it successfully booted in my Mega Drive emulator. After that, I worked on creating a library for interfacing with the Mega Drive hardware, such as reading the controllers and uploading graphical data to the video display processor. That, in turn, allowed me to make the homebrew more elaborate, eventually developing into a Columns-like block stacking puzzle game:

    As I was making this homebrew, I encountered a peculiar error: "symbol '__umodsi3' undefined". This is one of those helper functions that I mentioned earlier, which implements 32-bit modulo because the 68000 lacks such an instruction. These functions are provided by libgcc, which can be compiled and installed similarly to the libstdc++-v3 library from before. Instructions for this can be found in the OSDev article. Even with libgcc installed, however, the error persisted; this is because, unlike libstdc++-v3, libgcc needs to be linked to the program ('-lgcc'). With that done, the error was finally gone.

    Supporting Global Constructors
    As is typical for a puzzle game, the first piece that the player is given should be random. To implement this, I had a global variable that represented the colour of the current piece, and I initialised it using a call to the game's RandomColour function. I expected such code to produce a compiler error, as it does in C, however, C++ actually allows this. Yet, when I ran the game, the piece would always be the default colour. I could even booby-trap the RandomColour function to crash the game, and yet it would not. Clearly, the variable was not being initialised properly.

    Leave it to the OSDev wiki to save the day again! In another article, it is detailed that a few extra assembly files need to be written and linked, and the bootstrapping code needs to call a function called '_init', which will in turn call the global constructors. The instructions given are mainly for x86 platforms, but it is easy enough to fill in the gaps for the 68000. Here are the 68000 versions of the 'crti.s' and 'crtn.s' files:

    Code (Text):
    1. .section .init
    2. .global _init
    3. _init:
    5. .section .fini
    6. .global _fini
    7. _fini:
    Code (Text):
    1. .section .init
    2.    rts
    4. .section .fini
    5.    rts

    With this done, RandomColour was finally being called and the piece was set to a random colour at the start of the game!

    Because Mega Drive games are incapable of exiting, there is no need to implement global destructors. As a result, the linker script discards anything from the '.dtors' and '.fini' sectors.

    At this point, the toolchain seems to be very complete, at least as far as free-standing C++ compilers go.

    Writing C++ is way easier than writing assembly, since I do not have to worry about register allocation and implementing complex algorithms with long lists of dual-operand instructions. The code-generation can be sub-par ('compilers write better assembly than humans', my arse), but if you are mindful to work around GCC's quirks, you can still get it to produce very efficient assembly.

    I have always wondered about the process of turning object files into executables and how they are executed, so to finally explore this subject has been a treat! For the longest time, C and C++ existed in a bubble to me: while I understood the languages themselves, I did not understand the environment in which they operated, and yet this was the complete opposite of how I understood assembly, as I knew the process of initialising the Mega Drive all the way down to the 68000's vector table. With this knowledge, I can finally bridge the gap between the two, and now understand how to initialise a system with assembly, and then run C/C++ code!
    Last edited: Jun 25, 2024
    • Like Like x 6
    • Informative Informative x 2
    • List
  2. BenoitRen


    Tech Member
    I'm interested in seeing some examples of this. :)
  3. Clownacy


    Tech Member
    Code (C++):
    1. void badfill()
    2. {
    3.     array.fill(0x55555555);
    4. }
    Code (ASM):
    1. badfill():
    2.         lea array,%a0
    3. .L15:
    4.         move.l #1431655765,(%a0)+
    5.         cmp.l  #array+1024,%a0
    6.         jne    .L15
    7.         rts
    Assembly (with '-funroll-loops'):
    Code (ASM):
    1. badfill():
    2.         lea    array,%a0
    3. .L50:
    4.         move.l %a0,%a1
    5.         move.l #1431655765,(%a1)+
    6.         move.l #1431655765,(%a1)+
    7.         move.l #1431655765,(%a1)
    8.         move.l #1431655765,12(%a0)
    9.         move.l #1431655765,16(%a0)
    10.         move.l #1431655765,20(%a0)
    11.         move.l #1431655765,24(%a0)
    12.         move.l #1431655765,28(%a0)
    13.         lea    (32,%a0),%a0
    14.         cmp.l  #array+1024,%a0
    15.         jne    .L50
    16.         rts
    Assembly (hand-written):
    Code (ASM):
    1. goodloop:
    2.    lea      array,a0
    3.    move.l   #$55555555,d0
    4.    moveq    #$100/8-1,d1
    5. .Lloop:
    6.    move.l   d0,(a0)+
    7.    move.l   d0,(a0)+
    8.    move.l   d0,(a0)+
    9.    move.l   d0,(a0)+
    10.    move.l   d0,(a0)+
    11.    move.l   d0,(a0)+
    12.    move.l   d0,(a0)+
    13.    move.l   d0,(a0)+
    14.    dbf      d1,.Lloop
    15.    rts
    Code (C++):
    1. static volatile unsigned char* const YM2612 = reinterpret_cast<volatile unsigned char*>(0xA04000);
    2. #define YM2612_A0 YM2612[0]
    3. #define YM2612_D0 YM2612[1]
    4. #define YM2612_A1 YM2612[2]
    5. #define YM2612_D1 YM2612[3]
    7. static volatile unsigned short &bus_request = *reinterpret_cast<volatile unsigned short*>(0xA11100);
    9. void WriteFMI(const unsigned char address, const unsigned char data)
    10. {
    11.     bus_request = 0x100;
    12.     while ((bus_request & 1) != 0);
    14.     while ((YM2612_A0 & 0x80) != 0);
    15.     YM2612_A0 = address;
    16.     YM2612_D0 = data;
    18.     bus_request = 0;
    19. }
    Code (ASM):
    1. WriteFMI(unsigned char, unsigned char):
    2.         move.l %d2,-(%sp)
    3.         move.w 8(%sp),%d2
    4.         move.w 10(%sp),%d1
    5.         move.w #256,10555648
    6. .L62:
    7.         move.w 10555648,%d0
    8.         btst   #0,%d0
    9.         jne    .L62
    10. .L63:
    11.         move.l #10502144,%a0
    12.         move.b (%a0),%d0
    13.         jmi    .L63
    14.         move.b %d2,(%a0)
    15.         move.b %d1,10502145
    16.         move.w #0,10555648
    17.         move.l (%sp)+,%d2
    18.         rts
    Assembly (hand-written):
    Code (ASM):
    1. WriteFMI:
    2.    lea     ($A11100).l,a0
    3.    move.w  #$100,(a0)
    4.    lea     ($A04000).l,a1
    5.    moveq   #0,d0
    6. .Lloop1:
    7.    btst    d0,(a0)
    8.    bne.s   .Lloop1
    10. .Lloop2:
    11.    tst.b   (a1)
    12.    bmi.s   .Lloop2
    14.    move.b  9(sp),(a1)
    15.    move.b  11(sp),1(a1)
    17.    move.w  d0,(a0)
    18.    rts
  4. Pexs


    Otherwise known as Spex Member
    Wow, this is beyond impressive! The fact that all these years of reverse engineering and modding let you skip so many of the "complex bits" like programming bootstrappers is fascinating.

    Do you think you'll continue pushing your compiler to the levels of SGDK, or will it mostly exist as a tech exercise of sorts? Like, what are you going to do next?
  5. Clownacy


    Tech Member
    It's just a technical exercise: I think that the knowledge of how to create the toolchain is more valuable than the toolchain itself, which is why I've posted a big write-up but not uploaded the toolchain itself anywhere.

    As for what I'll do next, I may finish converting my Sonic 2 Clone Driver v2 to C++ and rely on this toolchain to build it. The driver is much more maintainable in C++ than it is in assembly, since I can rely on classes, inheritance, templates, and std::variant to streamline the process of creating and using multiple types of track. If I do that, then I'll need to start releasing builds of the toolchain so that people can compile the driver with it. That or I port the driver to SGDK. I don't often create homebrew, so the toolchain's ability to create whole ROMs is of limited use to me. I might be able to figure out a way to make the Sonic disassemblies produce a binutils-compatible object file, which would allow me to link it with C++ object files, making it possible to use C++ code in Sonic hacks. I don't have all the details for that figured out, though, so I can't say for sure that that will ever materialise.
    Last edited: Jun 23, 2024
  6. Cooljerk


    Professional Electromancer Oldbie
    Great tutorial!

    If gdb is built with python, you can link it up with an ide like qt creator and have live source debugging with clickable break points and such. You need a 68k gdb stub to do this, but the atari jaguar scene has a relevant source just for that which maps things to the appropriate vectors. Then, with blastem, you can enable remote gdb debugging to have a completely integrated development enviornment. You need to add the -g flag to your gcc options so debug symbols and labrels arent stripped. If anybody is interested i can give more detail on this.
  7. Kilo


    That inbetween sprite from S&K's title screen Tech Member
    S1 - Metal Sonic's Challenge, Sonic 1 Rev01 ASMX Disasm
    When are we getting a JRE and Java to the Mega Drive, or .NET and C#? :V
  8. Cooljerk


    Professional Electromancer Oldbie
  9. Kilo


    That inbetween sprite from S&K's title screen Tech Member
    S1 - Metal Sonic's Challenge, Sonic 1 Rev01 ASMX Disasm
    Well then. Let's get to it, folks!