I've always been interested in doing things 'right', so Clownacy's articles on C were of great interest to me. In preparation for beginning in earnest on a C port of Sonic & Knuckles Collection, I decided to make this thread. Things I know to keep in mind: char is one byte, and at least 8 bits short is at least 16 bits int is at least 16 bits long is at least 32 bits don't use C99 fixed-size types don't use unions don't assume endianness What worries me at the moment is the free part (actfree) of Sonic games' sprite status table. The way it is used makes assumptions on the size of the data. I've also just read that a pointer isn't always the same size as an integer, and may even differ in size depending on what it points to. Yikes! Anything else I should keep in mind?
'char' is signed or unsigned depending on the target CPU. For instance, it is signed on x86 and unsigned on ARM. Do not assume that 'char' is signed. '#pragma once' is not standard C; use manual header guards instead. Bit-shifting signed integers is not well-defined, so multiply and divide them by a power of two instead, if you must. In my Sonic 2 C port, I made the free part of the sprite status table into a union of arrays of various types, like this: Code (Text): union { // 0x16 bytes of scratch RAM, to be done with as the object pleases struct { uint8_t objoff_2A; uint8_t objoff_2B; uint8_t objoff_2C; uint8_t objoff_2D; uint8_t objoff_2E; uint8_t objoff_2F; uint8_t objoff_30; uint8_t objoff_31; uint8_t objoff_32; uint8_t objoff_33; uint8_t objoff_34; uint8_t objoff_35; uint8_t objoff_36; uint8_t objoff_37; uint8_t objoff_38; uint8_t objoff_39; uint8_t objoff_3A; uint8_t objoff_3B; uint8_t objoff_3C; uint8_t objoff_3D; uint8_t objoff_3E; uint8_t objoff_3F; } scratch8u; struct { int8_t objoff_2A; int8_t objoff_2B; int8_t objoff_2C; int8_t objoff_2D; int8_t objoff_2E; int8_t objoff_2F; int8_t objoff_30; int8_t objoff_31; int8_t objoff_32; int8_t objoff_33; int8_t objoff_34; int8_t objoff_35; int8_t objoff_36; int8_t objoff_37; int8_t objoff_38; int8_t objoff_39; int8_t objoff_3A; int8_t objoff_3B; int8_t objoff_3C; int8_t objoff_3D; int8_t objoff_3E; int8_t objoff_3F; } scratch8s; struct { uint16_t objoff_2A; uint16_t objoff_2C; uint16_t objoff_2E; uint16_t objoff_30; uint16_t objoff_32; uint16_t objoff_34; uint16_t objoff_36; uint16_t objoff_38; uint16_t objoff_3A; uint16_t objoff_3C; uint16_t objoff_3E; } scratch16u; struct { int16_t objoff_2A; int16_t objoff_2C; int16_t objoff_2E; int16_t objoff_30; int16_t objoff_32; int16_t objoff_34; int16_t objoff_36; int16_t objoff_38; int16_t objoff_3A; int16_t objoff_3C; int16_t objoff_3E; } scratch16s; struct { union { struct { uint16_t filler1; uint32_t objoff_2C; uint32_t objoff_30; uint32_t objoff_34; uint32_t objoff_38; uint32_t objoff_3C; }; struct { uint32_t objoff_2A; uint32_t objoff_2E; uint32_t objoff_32; uint32_t objoff_36; uint32_t objoff_3A; uint16_t filler2; }; }; } scratch32u; struct { union { struct { int16_t filler1; int32_t objoff_2C; int32_t objoff_30; int32_t objoff_34; int32_t objoff_38; int32_t objoff_3C; }; struct { int32_t objoff_2A; int32_t objoff_2E; int32_t objoff_32; int32_t objoff_36; int32_t objoff_3A; int16_t filler2; }; }; } scratch32s; }; In hindsight, I think this was a terrible idea: not only does it lead to difficult-to-read code, but it uses C99's fixed-width integer types. Instead, I think a better solution would be to make this so-called "scratch RAM" an array of unsigned chars, and use getters and setters to pack integers into these bytes. That way, regardless of the target CPU, a longword will always use 4 bytes, just like on a Mega Drive. Code (Text): #define timedelay 0x30 #define already_fired 0x31 #define near_sonic 0x32 ObjectMove(sst); if (GetScratchRAM_U8(already_fired) || GetScratchRAM_U8(near_sonic)) return; int player_distance = MainCharacter.x_pos - sst->x_pos; if (player_distance < 0) player_distance = -player_distance; if (player_distance >= 0x60 || !sst->render_flags.on_screen) return; SetScratchRAM_S8(near_sonic, true); SetScratchRAM_S8(already_fired, false); SetScratchRAM_S8(timedelay, 29); Alternatively, if you prioritise performance and code simplicity over having the RAM perfectly match a Mega Drive, you could make the scratch RAM a union between various structs, one for each object, like this: Code (Text): union { struct { signed char timedelay; unsigned char already_fired; unsigned char near_sonic; } buzz_bomber; struct { unsigned char timesr[2]; } whisp; } scratch; I don't know of any portability problems with using unions, so long as you're not using them to store data as one type and retrieve it as another.
Weird that the standard leaves the default signage up for interpretation when it comes to char, but not the other numeric types. An int is always a signed int. What this tells me is that one should avoid "char", and use "signed char" and "unsigned char", instead. From my experience with decompiling Sonic CD, they're used to read and write individual bytes of larger values, which I think counts as storing data as one type and retrieving it as another. At the very least, this way of working leads to endianness problems. I was thinking that an alternative would be to use bit-shifting and masks to get at an individual byte instead, but if bit-shifting signed values is problematic, that won't always be an option. Side note: just read that C++11 goes one step further and says that left-bit-shifting signed integers is undefined. Right-bit-shifting signed integers is implementation-defined.
What is wrong with the standard fixed-width types (e.g. int32_t)? Don't those achieve what you want? (guaranteed size for a given integral type)
After doing research I have to say it makes sense that bit-shifting negative numbers is not defined. We're used to a sign bit and two's complement, but there are more ways to implement negative numbers, and C gives that freedom. I found the following code to convert an unsigned number to a signed number in a portable way: Code (Text): if ( ret <= INT_MAX ) ret_as_signed = ret; else ret_as_signed = -(int)(UINT_MAX - ret) - 1; The other way around is basically taking the absolute value: Code (Text): if (ret < 0) return ret * -1; else return ret; Read this: Stop Using Fixed-Width Integer Types!
It 404s, or actually all browsers complain it's an insecure site. Maybe you could just paste what I'm meant to read.
This might be me speaking out of my rear end, but under what circumstances will anyone, in the year of our Lord 2024, use a CPU that does not use two's complement for signed number representation, other than enthusiasts messing around with really old systems or specialized systems that aren't the type of thing you'd play a video game about a blue hedgehog on? Not helping is that the next version of C is planning to remove all signed number representations other than two's complement (PDF).
I meant you could've just shared a sentence about it. Anyway, clownacy's site is blocked on secure/academic WIFI apparently. I have to go "off the grid." EDIT: okay then use the fast variants of the standard types. The issue there is that sometimes you need to guarantee exact same data layout when working between CPU code and GPU code. Then you have to assert that the sizes are the same, or you have to know your hardware better.
I don't know, but it doesn't matter. I was just explaining the reasoning behind bit-shifting signed integers being implementation-defined. Which means we can't rely on compilers implementing it in the same way, even if they do use two's complement for signed numbers. By the way, I've since read that C++20 already mandates that signed numbers use two's complement.
In order to make Sonic & Knuckles Collection's code adhere to all these constraints, you're basically going to have to rewrite it entirely.
Maybe not everything, but there are certainly a lot of pointer arithmetic based on exact data sizes, storing data as one size and reading it as another, signed/unsigned arithmetic shenanigans, etc. Quite a few things will need adjustment if the 68000 RAM block is not exactly 0x10000 bytes and aligned on a boundary of such. How do you read something like S3K's level layout header in a platform-agnostic way?
If the memory needs to be the same as the Mega Drive's, I've already got a problem. Just like in Sonic Jam, the sprite status tables take 2 more bytes each as they need to align on a 4 byte boundary.
The reason that the sprite status table has to align on a 4 byte boundary is because the first data member is a memory pointer of 4 bytes. I researched if this was also a requirement on the 68000 CPU, but it looks like the only requirement is that the value is aligned on a word (2 byte) boundary. x86 does support unaligned memory access, which is probably why the machine-translated ASM works fine. PowerPC does as well, but for ARM it varies. So, if I really wanted to keep the memory layout the same, I'd have to replace the pointer by an array of four chars, which I'd cast to a pointer when reading or writing.
That'd fail anyway as soon as you try porting it to a platform that doesn't use 32-bit pointers. Like I said, there are going to have to be a lot of rewrites if you want to make it truly portable, because there are almost no guarantees when it comes to different CPUs. Technically, even 8 bits being a basic unit of memory isn't a universal truth, it's just something all the current CPU manufacturers agreed on.
If you're targeting modern day, general purpose, consumer hardware then there's no reason not to use fixed width integer types, if that's what your data type calls for. The article seems to imply that there's some kind of negative performance impact involved in trying to use 16-bit arithmetic on a >=32-bit platform. This is simply not the case for general purpose desktop or mobile hardware. If you find that you're constantly having to write code to manually mask an unsigned value to 16-bits, then that's a good indication that you should probably be using a uint16_t. The compiler will add the masking for you, reducing the chance of you making mistakes, and ensure the correct overflow behaviour (signed overflow is UB - don't do that). Both x64 and Arm have instructions for masking and sign-/zero-extension that are single-cycle and can be parallel issued (ignoring data depenedency hazards). Memory alignment, cache locality, branch prediction and instruction scheduling are going to be performance concerns long before bitmasking a register is even a blip on the profiler's radar. I specifically mentioned "general purpose", multiple times, above because outside of those platforms YMMV. If you're writing code for a DSP, for example, then you're going to have to work with whatever the hardware gives you, and you may or may not have fixed-width types in your stdlib. The portability argument stands in this case, but do you really need your general purpose desktop program to also be portable to a DSP? Probably not. Be pragmatic, not dogmatic.
If you're compiling for Windows, macOS, Linux, BSD, Android, Haiku, etc. with clang, gcc or msvc then there's a very, *very* good chance that they do exist. That's why I mentioned modern, general purpose platforms and it's also why I added that last paragraph. If you need to target a platform where they're not availble then, obviously, you can't use them. If you're targeting platforms where they _are_ available and their usage reduces the chance of bugs, and communicates the intention of code more clearly, then why not use them?
The thread is about writing portable C. If I targeted a subset of platforms, it wouldn't be portable.