That Cannon Fodder Theme reminded me of this on topic, I never knew the GBC was that powerful sound wise, especially since I never owned one :<
Hmm, when people compare CPUs by the clock speed (Hz) it's always been daunting to me how the fact that the number by itself is close to meaningless... The clock cycle is important, sure, but how many instructions can be completed per cycle and the complexity of these said instructions is just as important. A 3GHz Pentium 4 gets its ass handed to it by a 3GHz Core 2 Duo processor, yet they share the same clock speed. An 8MHz 68000 can similarly defeat an 8MHZ GBZ80 (Game Boy Color in double speed mode, very rarely used in commercial applications).
That's partially because the 68000 has a 24-bit address bus and 32-bit registers (although only a 16-bit ALU) while the GBZ80 (like all Z80 and 8080 variants) is a purely 8-bit processor (although it can chain two 8-bit registers together for use as a 16-bit data or address register). The P4 and Core 2 Duo are much closer in architecture, and in fact a big reason the Core 2 Duo wins is because it's a dual-core processor while the P4 is a single-core. It's not the only reason, as the C2D is more efficient in general, but then again the C2D is 64-bit while the P4 is 32-bit. It's similar to the difference between the 68000 and the 68020/030/040/060. Sure, they use the same basic instruction set, but the latter processors are more efficient and have additional features that allow them to execute code even faster at the same clock rate. The comparison between the 68000 and the GBZ80 is a lot less informative as the difference between processors in the same family, since they are built on completely different architectures. It's like saying that an 8MHz 68000 can beat an 8MHz 6502 variant - of course it can, because the 6502 is only 8-bit (excepting the 65816 variant), while the 68000 is closer to 32-bit than 16-bit.
And even then all that is meaningless because the real cap is always the non-CPU hardware, which has huge latency on their communication buses and cause the CPU to needlessly wait. In fact, this is the main headache of GPU programmers, the transfer bandwidth between CPU and GPU, which is slow as shit. Also these days CPU architecture changes so quickly that you can't even compare two consecutive CPUs in the same family. The newer CPUs will be better at some things but worse at other things.
Depends on the hardware, though. On the C64, for example, the 6510 runs more or less in lockstep with the VIC-II (discounting badlines, the 6510 always accesses on phase 2 and the VIC-II accesses on phase 1, both on the same clock cycle). This greatly speeds things up, although it's negated by the need to fetch 40 12-bit words every 8 lines, not to mention sprite fetches should they be enabled. C64 coders are lucky - the video chip is memory-mapped, so access to registers and setting up pointers to graphic data is much faster than on the NES or MD where you have to do it through a single fucking port. I wonder if it'd ever be possible to modify the Genesis to make the VDP memory-mapped instead of port-mapped. I'd be content with it only being in an emulator, if only to play around with the possibilities.
You'd have a huge amount of bus conflicts, so you'd need to do so many bus requests it'd be pointless. You need video hardware that was designed for such access, seriously (the GBA does it). On that note, the System16 hardware has all video stuff mapped in memory if I understood the MAME sources correctly.