How to translate basic 68k assembly to basic c code

Discussion in 'Engineering & Reverse Engineering' started by Chris Pancake, Dec 23, 2017.

  1. Chris Pancake

    Chris Pancake

    Member
    7
    0
    1
    Sonic Union
    This is my first topic!
    Unfortunately I was unable to find the c equivalent to the bset, bclr, bchg, bvc and bvs.. No, it was just laziness.
    If you want to know how to do ROL (bit rotate left) and ROR (bit rotate right), look: there.

    Variable width
    Most 68k assembly instructions ends with ".something", it can be:
    .b: BYTE (8-bit, int8_t, char)
    .w: WORD (16-bit, int16_t, short)
    .l: LONG (32-bit, int32_t, int (on 32-bit and 64-bit systems), long (on 16-bit systems))

    Since it's a 16-bit system, there's no natural 64-bit integer.

    Local variables
    Consider the following 68k code:
    Code (Text):
    1.  
    2. Generic:
    3.     move.l #hello, d0
    4.     move.l #world, d1
    5.  
    Here we're moving the value of the constant 'hello' to the register d0 and we're setting the value of the constant 'world' to the register d1.
    (move long #hello to d0)
    (move long #world to d1)

    Translating it we get this:
    Code (Text):
    1.  
    2. int32_t d0;
    3. int32_t d1;
    4. d0 = hello;
    5. d1 = world;
    6.  
    You probably wondered (why the register keyword?), simple. It tells the compiler that the variable value should be a register (if possible) instead of being allocated in the stack. This keyword is not needed when this is a release build though. (which normally has the O2 build option).
    Nevermind, it shows a warning in c17 target.

    Conditions
    Instructions:
    cmp.something y, z: "Compare the register z with register x".
    cmpi.something y, z: "Compare the register z with short y"
    tst.something y: "Test if register y is zero"

    Signed branches:
    beq x: branch (goto x) if the cmp x, y are equal.
    bne x: branch (goto x) if the cmp x, y are not equal.
    bge x: branch (goto x) if the cmp x is greater or equal to y.
    bgt x: branch (goto x) if the cmp x is greater than y.
    ble x: branch (goto x) if the cmp x is less or equal to y.
    blt x: branch (goto x) if the cmp x is less or equal to y.

    Unsigned branches:
    bcc x: branch (goto x) if the cmp x is greater or equal to y.
    bhi x: branch (goto x) if the cmp x is greater than y.
    bls x: branch (goto x) if the cmp x is less or equal to y.
    bcs x: branch (goto x) if the cmp x is less or equal to y.

    Consider the following code:
    Code (Text):
    1.  
    2. Generic:
    3.     move.l #hello, d0
    4.     move.l #world, d1
    5.     cmp.l d0, d1
    6.     beq.s Generic2
    7.     move.l #$0, d0
    8. Generic2:
    9.     move.l #$0, d1
    10.  
    Translating, we get:
    Code (Text):
    1.  
    2. int d0 = hello;
    3. int d1 = world;
    4.  
    5. if (d0 == d1)
    6. {
    7.     goto Generic2;
    8. }
    9. d0 = 0;
    10. Generic2:
    11. d1 = 0;
    12.  
    ...But it's wrong. Because there's a useless label there, which makes the code ugly (OH NO). So, to translate it we should negate the compare instructions! Look:
    Code (Text):
    1.  
    2. int d0 = hello;
    3. int d1 = world;
    4.  
    5. if (d0 != d1)
    6. {
    7.     d0 = 0;
    8. }
    9.  
    10. d1 = 0;
    11.  

    Addiction and subtraction
    It's easy.
    Instructions:
    add.something y, z: "Add register y to register z".
    addi.something y, z: "Add short y to register z"
    add.something x, y(z): "Add register x to adress (y + z)"
    addi.something x, y(z): "Add short x to adress (y + z)"
    add.something y, z: "Subtract register y from register z".
    addi.something y, z: "Subtract short y from register z"
    add.something x, y(z): "Subtract register x from adress (y + z)"
    addi.something x, y(z): "Subtract short x from adress (y + z)"

    The following 68k code:
    Code (Text):
    1.  
    2. add.b d0, d0
    3. add.b #$0001, d0
    4. add.b d0, (a0)
    5. add.b #$0001, (a0)
    6. sub.b d0, d0
    7. subi.b #$0001, (a0)
    8. sub.b d0, (d1)
    9. subi.b #$0001, (a0)
    10.  
    Translated should be like this:
    Code (Text):
    1.  
    2. char d0;
    3. char* a0;
    4.  
    5. d0 += d0;
    6. d0 += 0x0001;
    7. *a0 += d0;
    8. *a0 += 0x0001;
    9. d0 -= d0;
    10. d0 -= 0x0001;
    11. (*a0) -= d0;
    12. (*a0) -= 0x0001;
    13.  
    Don't forget that a POINTER points to a ADRESS.

    The clear, swap and exchange instruction
    Instructions:
    clr.something y: sets zero to the value of y. (x should be l or w or b)
    swap x: swaps the upper word (0xXXXX0000) with the lower word (0x0000XXXX) of register x.
    exg.something y, z: swaps register y with register z

    clr is basically:
    Code (Text):
    1.  
    2. x = 0; // Clear x
    3.  
    Swap is just a shift trick, it's like this:
    Code (Text):
    1.  
    2. x = ((x & 0xFFFF0000) >> 16) | (x << 16);
    3.  
    (x & 0xFFFF0000) >> 16 is anding x to 0xFFFF0000 and then shifting to right by 16 bits, so, if the value of x is 0xDEADBEEF, after doing this it's value will be 0x0000DEAD.
    (x << 16) is shifting x to left by 16 bits, so, if the value of x is 0xDEADBEEF, after doing this it's value will be 0xBEEF0000.
    After doing the two operations, you will "OR (|)" them to get the final result. (0xBEEF0000 | 0x0000DEAD = 0xBEEFDEAD);

    Bitwise operators
    Another easy one.
    Instructions:
    not.something y: nots register y (!y).

    and.something y, z: ands register y to register z.
    andi.something y, z: ands short y to register z.
    and.something x, y(z): ands register x to the adress in register y + short z.
    andi.something x, y(z): ands short x to the adress in register y + short z.

    or.something y, z: ors register y to register z.
    ori.something y, z: ors short y to register z.
    or.something x, y(z): ors register x to the adress in register y + short z.
    ori.something x, y(z): ors short x to the adress in register y + short z.

    xor.something y, z: xors register y to register z.
    xori.something y, z: xors short y to register z.
    xor.something x, y(z): xors register x to the adress in register y + short z.
    xori.something x, y(z): xors short x to the adress in register y + short z.

    The following 68k code:
    Code (Text):
    1.  
    2. add.b $#0x0002, d0
    3.  
    Translated should be like this:
    Code (Text):
    1.  
    2. d0 ^= 0x0002;
    3.  

    The neg and ext
    Instructions:
    neg.something x negates x (positive becomes negative and negative becomes positive)
    ext.something x extends x (will be explained later)

    The neg instruction is basically this:
    Code (Text):
    1.  
    2. x = -x;
    3.  
    The ext instruction is already handled in c, it extends for example, char to short and short to int.

    Multiplication and division
    muls.something y, z: multiplies z with y. (x should be l or w or b)
    mulu.something y, z: multiplies z with y. (x should be l or w or b) (unsigned)
    divs.something y, z: divides z with y, the remainder will be saved in the upper word of z and the quotient will be saved in the lower word. (x should be l or w or b)
    divu.something y, z: divides z with y, the remainder will be saved in the upper word of z and the quotient will be saved in the lower word. (x should be l or w or b) (unsigned)

    The division is clearly different in c.
    Code (Text):
    1.  
    2. int d0 = d1 / 2;
    3. int d2 = d1 % 2;
    4.  
    In this c code, we're declaring a variable with the result value of the division of d1 with 2 and declaring another variable with the remainder of the division d1 with 2 (modulo).

    Branch on sign
    Instructions:
    bpl x: branch (goto x) if the result of cmp x, y (x minus y) is positive.
    bmi x: branch (goto x) if the result of cmp x, y (x minus y) is negative.

    The following 68k code:
    Code (Text):
    1.  
    2. Generic:
    3.     move.l #hello, d0
    4.     move.l #world, d1
    5.     cmp.l d0, d1
    6.     beq.s Generic2
    7.     move.l #$0, d0
    8. Generic2:
    9.     move.l #$0, d1
    10.     cmpi.l #0x0020, d0
    11.     bpl Generic4
    12. Generic3:
    13.     clr.l d0
    14. Generic4:
    15.     clr.l d1
    16.  
    Will look like this:
    Code (Text):
    1.  
    2. int d0 = hello;
    3.  int d1 = world;
    4.  
    5. if (d0 != d1)
    6. {
    7.     d0 = 0;
    8. }
    9.  
    10. d1 = 0;
    11.  
    12. if (d0 - 0x0020 < 0)
    13. {
    14.     d0 = 0;
    15. }
    16.  
    17. d1 = 0;
    18.  
    How to find if a label is a function or just a... label.
    Routines and subroutines in 68k are functions, but they don't have a difference to labels, so it can be confusing, but it's not that hard, generally routines have a 'rts' (return) in the end, and that's the end of a function, and the instruction that tells the preprocessor that certain label we'll be jumping to is a routine, are the bsr and jsr, the others (jmp, bra, branches...) are just for labels.

    Set conditions
    Instructions:
    seq x: set x if the cmp x, y are equal.
    sne x: set x if the cmp x, y are not equal.
    spl x: set x if the result of cmp x, y (x minus y) is positive.
    smi x: set x if the result of cmp x, y (x minus y) is negative.
    sge x: set x if the cmp x is greater or equal to y.
    sgt x: set x if the cmp x is greater than y.
    sle x: set x if the cmp x is less or equal to y.
    slt x: set x if the cmp x is less or equal to y.
    scc x: set x if the cmp x is greater or equal to y. (unsigned)
    shi x: set x if the cmp x is greater than y. (unsigned)
    sls x: set x if the cmp x is less or equal to y. (unsigned)
    scs x: set x if the cmp x is less or equal to y. (unsigned)
    for example, seq d0 sets d0 to 0xFFFFFFFF if it's true, otherwise it's cleared (set to zero).

    Consider the following code
    Code (Text):
    1.  
    2. Generic:
    3.     move.l #hello, d0
    4.     move.l #world, d1
    5.     cmp.l d0, d1
    6.     beq.s Generic2
    7.     move.l #$0, d0
    8. Generic2:
    9.     move.l #$0, d1
    10.     cmpi.l #0x0020, d0
    11.     bpl Generic4
    12. Generic3:
    13.     clr.l d0
    14. Generic4:
    15.     clr.l d1
    16.     cmp d1, d0
    17.     seq d0
    18.  
    Translating it, we get:
    Code (Text):
    1.  
    2. int d0 = hello;
    3. int d1 = world;
    4.  
    5. if (d0 != d1)
    6. {
    7.     d0 = 0;
    8. }
    9.  
    10. d1 = 0;
    11.  
    12. d0 = (d0 == d1) ? 0xFFFFFFFF : 0x00000000;
    13.  
     
  2. MainMemory

    MainMemory

    Have no fear...Amy Rose is here! Tech Member
    4,425
    72
    28
    SonLVL
    Your "Addition and Subtraction" section uses d1 as a pointer, which is not permitted on 68000. Aside from that, a lot of this is just common sense stuff that you'd figure out if you know C and 68000 ASM.
    The equivalent to bset #X,d0 is d0 |= 1 << X;
    The equivalent to bclr #X,d0 is d0 &= ~(1 << X);
    The equivalent to bchg #X,d0 is d0 ^= 1 << X;
     
  3. Chris Pancake

    Chris Pancake

    Member
    7
    0
    1
    Sonic Union
    To be honest, I didn't have so much practice with the 68k, already fixed that.
     
  4. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    For what is worth, "register" keyword is mostly ignored by C and C++ compilers and, since C++11, it has the same semantics as whitespace in C++. C, even C11, still pretends it does something but, in practice, it does not.
     
  5. Chris Pancake

    Chris Pancake

    Member
    7
    0
    1
    Sonic Union
    With optimization on of course (MSVC /Ox and GCC -Ox for example), that's why I said:
    I also changed the name of this topic because this guide is really basic.
     
  6. lil-g-gamegenuis

    lil-g-gamegenuis

    GO! GO! GO! GO! GO! GO! Member
    27
    0
    1
    olathe, ks
    Practicing to make better SMPS
    Actually even when optimization is off, it's still ignored. Compilers nowadays only treat it as a keyword with no meaning. Its about as useful as the auto modifier (Note: Not the auto type)
     
  7. Revival

    Revival

    The AppleTalk Network System Member
    This certainly seems correct, but this is something that could be done by an automatic translator. Aside from folding some lines into one-another the result of that would almost exactly match the original assembly on a line-by-line basis. It would be far from idiomatic C++.

    If a direct translation of a whole program from 68k ASM to C++ were intended, it would probably be better to make drastic changes - abandon manipulation of global variables representing the registers, for example, and instead make the subroutines pass data as actual arguments. The modern optimising C++ compiler could probably match the performance of the original assembly.
     
  8. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    They cannot, unless you are talking about really bad assembly; mostly because the modern C++ compilers which support 68k family at all are more oriented towards later models of the 68k family (which have pipelines), as well as using an ABI which makes heavy use of the stack.