M68k instruction cycle counts

Discussion in 'Engineering & Reverse Engineering' started by LocalH, Oct 13, 2005.

Thread Status:
Not open for further replies.
  1. LocalH

    LocalH

    roxoring your soxors Tech Member
    3,278
    4
    18
    wouldn't you like to know
    Super Guitar Hero II
    In my quest to fully understand the Genesis, I'm gathering data on the system's timing. I've already gathered the following from emulator sourcecodes and done calculations to confirm them, and they seem valid, given that the calculated framerate is right around 60Hz:

    What I need now is a full list of the cycle counts taken by each instruction, in each valid addressing mode. Does anybody have anything of the sort?
     
  2. LOst

    LOst

    Tech Member
    4,889
    1
    16
    I would like to have them too.
     
  3. re-inferno

    re-inferno

    Member
    49
    0
    6
    I may talk complete BS here since I'm not aware of 68k assembly language, but can't they be calculated?

    On a MOS 6502 it went like this:

    - CPU had 8 bit registers and data bus
    - any instruction was 8 bit coded (since it was the smallest workable "amount" and there were less than 256 instructions)

    so e.g.

    lda $0800 ... 8 bit instruction + 16 bit address = 24 bit command = 3 cycles
    lda #$F0 ... 8 bit instruction + 8 bit address = 16 bit command = 2 cycles
     
  4. Mask of Destiny

    Mask of Destiny

    Tech Member
    87
    0
    6
    Well generally speaking, each memory access takes 4 cycles on the 68000. So a 16-bit long instruction that doesn't access memory typically takes 4 cycles (like add.l d0, d1). A 16-bit long instruction that accesses memory once typically takes 8 cycles (like move.w d0, (a0)), etc.

    However, some instructions like MULS/MULU and DIVS/DIVU can take longer and at least on the 68020, I think some of the addressing modes took extra cycles to calculate beyond just the extra memory access (though the 68020 introduced some more complex addressing modes).

    That said, I don't know where you can get a complete list.
     
  5. LocalH

    LocalH

    roxoring your soxors Tech Member
    3,278
    4
    18
    wouldn't you like to know
    Super Guitar Hero II
    So that would mean that a NOP, for example, would take four cycles for reading the instruction word, right? I would also assume there are no "dummy" memory accesses like on the 6502? Where, for example, even though NOP has no operand, the 6502 still reads the byte after the NOP instruction and throws it away, meaning that all instructions take at least 2 cycles.
     
  6. Mask of Destiny

    Mask of Destiny

    Tech Member
    87
    0
    6
    NOPs definately take 4 cycles. The thing I'm not 100% sure about is how branch instructions fit into this. The 68000 has a prefetch feature of sorts that works like a primitive pipeline. Basically while it's executing one instruction it fetches the next. A branch instruction would seem to interrupt this so I'm not exactly sure what happens then.
     
  7. Weird Person

    Weird Person

    You lost two seconds reading this Member
    367
    0
    0
    Who knows?
    Just a question: Do I need to know how many cycles each instruction use to do animations in the Mega Drive? There isn't any sort of dispositive in the MD/68k where I can see how many cycles (or time) I spent?

    Man, If I have to know how many cycles each instruction spent to do a good animation in the MD, 68k programming in the Genesis must be a pain in the ass.
     
  8. LocalH

    LocalH

    roxoring your soxors Tech Member
    3,278
    4
    18
    wouldn't you like to know
    Super Guitar Hero II
    More than likely not, unless what you're doing doesn't fit in one frame (and you want it to run one-framed). I want to know simply because I want to play around with hitting VDP registers on various parts of a scanline (I know you can't hit VDP RAM but you can hit the registers). I've already made jittery raster splits, but I'm worried that it might not be easily possible to get closer than 4 cycles to a stable interrupt.

    What I'm asking will be more useful for democoding than anything - if your code is large enough that it doesn't run in one frame, then I doubt cycle-optimizing will help much.
     
  9. Sonic Hachelle-Bee

    Sonic Hachelle-Bee

    Taking a Sand Shower Tech Member
    739
    14
    18
    Lyon, France
    Sonic 2 Long Version
    Yes. It's clear that making a new algorithm for your code, or again divide it to be run on several frames instead of one, can help much more than cycle-optimizing.
     
  10. LocalH

    LocalH

    roxoring your soxors Tech Member
    3,278
    4
    18
    wouldn't you like to know
    Super Guitar Hero II
    What would help would be VDP documentation that is as detailed as this VIC-II documentation, down to the cycle level, essentially. Of course, it's much easier for the C64 as the whole system runs off one unified clock. From C= Hacking issue 10:

    Code (Text):
    1.   NTSC-M systems:
    2.  
    3.             Chip      Crystal  Dot      Processor Cycles/ Lines/
    4.     Host    ID        freq/Hz  clock/Hz clock/Hz  line    frame
    5.     ------  --------  -------- -------- --------- ------- ------
    6.     VIC-20  6560-101  14318181  4090909   1022727      65    261
    7.     C64     6567R56A  14318181  8181818   1022727      64    262
    8.     C64     6567R8    14318181  8181818   1022727      65    263
    9.  
    10.   Later NTSC-M video chips were most probably like the 6567R8.  Note
    11.   that the processor clock is a 14th of the crystal frequency on all
    12.   NTSC-M systems.
    13.  
    14.   PAL-B systems:
    15.  
    16.             Chip      Crystal  Dot      Processor Cycles/ Lines/
    17.     Host    ID        freq/Hz  clock/Hz clock/Hz  line    frame
    18.     ------  --------  -------- -------- --------- ------- ------
    19.     VIC-20  6561-101   4433618  4433618   1108405      71    312
    20.     C64     6569      17734472  7881988    985248      63    312
    21.  
    22.   On the PAL-B VIC-20, the crystal frequency is simultaneously the dot
    23.   clock, which is BTW a 4th of the crystal frequency used on the C64.
    24.   On the C64, the crystal frequency is divided by 18 to generate the
    25.   processor clock, which in turn is multiplied by 8 to generate the
    26.   dot clock.
    27.  
    28.   The basic timings are the same on all 6569 revisions, and also on
    29.   any later C64 and C128 video chips.  If I remember correctly, these
    30.   values were the same on the C16 videochip TED as well.
    I know that the Genesis isn't setup so much like the C64. I know there's the master 53MHz crystal, and that it is divided by 7 for the 68K/2612 and 15 for the Z80/PSG. I haven't got the dot clock yet, so I don't know how that ties into the VDP, and I don't know if it'll even matter that much as a practical matter. On the C64, it's mainly used to know the fact that a 1MHz cycle is 8 pixels.
     
  11. LOst

    LOst

    Tech Member
    4,889
    1
    16
    LocalH, some special water effects are made during hblank in Sonic 3 that would be impossible to do without knowing the cycles of the instruction used. I know exactly what you are looking for. I only have one advice, and that is to keep looking, and asking around 68k communities.
     
  12. Aurochs

    Aurochs

    Единый, могучий Советский Союз! Tech Member
    2,343
    0
    0
    Whatever catches my fancy
    Instruction timings are given in the 68010 User's Manual. I believe that this is it, but that's not where I got it, so maybe not.

    EDIT: I probably should have told Simon to put it in the programming materials topic when I first came across it. Oh well.
     
  13. Mask of Destiny

    Mask of Destiny

    Tech Member
    87
    0
    6
    The VDP runs at some multiple of the dot clock. This is why the max number of sprites and DMA bandwidth changes when you switch between 40 cell and 32 cell mode.

    The clock signal Epicenter was using to overclock the 68K was actually 2x the pixel clock. Devster places this signal at 13.3MHz and Epicenter places it at 13.1MHz. 13.4(2075) seems the most likely for the signal since that's what you get when you divide the master clock by 4. That would give you a dot clock of ~6.71Mhz for 40 cell mode and about 5.37 for the 32 cell mode. So it looks like clock divisors of 8 and 10 respectively.
     
  14. LOst

    LOst

    Tech Member
    4,889
    1
    16
    Can Epicenter have more sprites with the same horizontal offset on the screen then? (Remember that even if he can, Sonic and other games have a sprite link max of 80 sprites... Software limit)
     
  15. LocalH

    LocalH

    roxoring your soxors Tech Member
    3,278
    4
    18
    wouldn't you like to know
    Super Guitar Hero II
    No, just like the C64 is forever limited to 8 sprites on a scanline, I'd imagine the VDP is limited to 320 pixels of sprite data per scanline in 40 cell mode, regardless of whether or not the sprites overlap. And also, I haven't tried to reset the sprite link table mid-frame, so I'm not sure if you could do a sprite multiplexer or not.
     
  16. Mask of Destiny

    Mask of Destiny

    Tech Member
    87
    0
    6
    You can. I've been told there's a game that sets the sprite table start register midscreen for a sort of reflection effect.

    Even if the limit isn't hardcoded, Epicenter didn't overclock the VDP, he overclocked the 68K.
     
  17. LocalH

    LocalH

    roxoring your soxors Tech Member
    3,278
    4
    18
    wouldn't you like to know
    Super Guitar Hero II
    That game wouldn't be Wiz 'n' Liz by Psygnosis, would it? It's intro sequence has a nice reflection effect, but I'm not sure if it uses sprites or not.

    It wouldn't surprise me if it was, because Psygnosis always did have awesome code, as far as I'm aware (as well as being quite prolific on the Amiga).
     
  18. Mask of Destiny

    Mask of Destiny

    Tech Member
    87
    0
    6
    I have no idea. It was brought up in a discussion about trying to make a faster Genesis emulator for the Dreamcast (the idea was to render the sprite layer using quads on the PowerVR).
     
  19. Quickman

    Quickman

    Tech Member
    5,584
    0
    16
    :x
    omg porjcet
    No. Epicenter can have many sprites below the line limit without slowdown. It doesn't solve the tile disappearance phenomenon from having too many sprites on one line. Watch the videos on his website.

    (Also ask him about the bloopers. They're hilarious.)
     
  20. Aurochs

    Aurochs

    Единый, могучий Советский Союз! Tech Member
    2,343
    0
    0
    Whatever catches my fancy
    Heh. I opened up my copy of the user's manual and the first thing I see is the Jameco logo. So now I know where I got it. =P

    Here it is. ~1 MB PDF.

    EDIT: Someone should add that to this post. It's a very importiant document.
     
Thread Status:
Not open for further replies.