The MULU and MULS instructions are known to be quite slow. They require at least 38 cycles to execute, with the worst case being 70 cycles. If you are multiplying with rather basic and small numbers, this isn't ideal. So, with this guide, I can show you an alternative way to perform multiplication, by just using addition and bit shifts. The best way I can visualize this is basically how you can take the number 1 and expand it in a way that matches the multiplier. The easiest way I can show this is in binary. Let's say we want to multiply by 10 ($0A). In binary that is 1010. So, how can we take the number 1 and expand it into that? Lemme show you... First, let's start off with our number 1, and let's bit shift it to the left 1 time. Code (Text): 0001 -> 0010 If you look closely, you can see that one of the bits in $0A now matches up. 0010 1010 So, how can we get the other bit set? Well, what we can do is save our current iteration for later, and keep going. Let's now bit shift to the left twice. Code (Text): 0010 -> 1000 Saved: 0010 Well, now the other bit is set, but the first bit we had is now clear, so what gives? 1000 1010 This is where the magic happens. Remember the 0010 we saved? Well, let's add it back! Code (Text): 1000 -> 1010 And hey look, it's our multiplier! This method basically consists of a series of bit shifts and saving and re-adding to get our product, thought of with the example of multiplying 1 by the multiplier. As you go on, you basically save an iteration whenever you match a bit in your multiplier, and then you add it back on later. Let's do another example. Let's say we want to multiply by 13 ($0D). In binary that is 1101. Well, from the get go, with our number 1, we can see that the lowest bit matches, so we'll put that to the side. 0001 1101 Now let's shift it to the left 2 times. Code (Text): 0001 -> 0100 Saved: 0001 We now get our next match. 0100 1101 What we can do in this instance is add this iteration to our previously saved iteration. So that makes our saved iteration 0101. Now let's do 1 more bit shift to the left. Code (Text): 0100 -> 1000 Saved: 0101 We get our final match. 1000 1101 Now let's add our saved iteration (0101) back in, and voila, it's 1101, our multiplier. We can implement the first example in m68k assembly like so, assuming we want to multiply as a 16-bit value, with d0 as our multiplicand. Code (Text): add.w d0,d0 ; 4 ; First bit shift move.w d0,d1 ; 4 ; Save iteration add.w d0,d0 ; 4 ; 2 more bit shifts add.w d0,d0 ; 4 add.w d1,d0 ; 4 ; Add saved iteration ; 20 cycles total And the second. Code (Text): move.w d0,d1 ; 4 ; Save iteration add.w d0,d0 ; 4 ; 2 bit shifts add.w d0,d0 ; 4 add.w d0,d1 ; 4 ; Add to saved iteration add.w d0,d0 ; 4 ; 1 more bit shift add.w d1,d0 ; 4 ; Add saved iteration ; 24 cycles total And as we can also see, these 2 processes are faster than the best case scenario for either MULU or MULS. Disclaimers It should be noted that this should be mainly reserved for more basic multiplications. The more bit shifts and additions you need to do, the slower it gets, and the more instructions you need to store. If you are multiplying with a rather large and complex number, then this method would be worse than just using MULU or MULS. Use this method wisely! This is not a catch-all solution for slow multiplication. If using MULU or MULS is faster, then use them. If you are dealing with negative numbers, then this can work, but be aware of any sign extensions you might need to deal with, depending on the sizes you work with. It's also much much faster to multiply with the absolute value of the multiplier and then negate the result than it is to attempt to do a series of shifts and additions to match the negative multiplier. I also do 2 additions for 2 consecutive bit shifts, because doing 2 separate additions on bytes and words is faster than a single bit shift instruction (8 cycles vs 10). However, if you are doing more than 2 shifts or are working with longwords, then using a bit shift instruction is the faster option (using bit shift instructions increases by 2 cycles per shift, vs. at least 4 for each addition).