Optimizing the DMA queue

Discussion in 'Engineering & Reverse Engineering' started by flamewing, Aug 9, 2014.

  1. The art doesn't need to start on a 128KB "bank"; it must simply not cross a 128KB boundary. If you have space left in a bank, you can move other, smaller data to the bank to fill it, instead of wasting the space with padding. However, if you're still developing your hack and the data is subject to change in size, you may want to do this later, otherwise you may have to reorganize the data repeatedly (and in the meantime, use the 128KB-safe version).
     
  2. RetroKoH

    RetroKoH

    Member
    1,658
    12
    18
    Project Sonic 8x16
    Good to know... how would I know where the first "bank" starts, exactly?
     
  3. MainMemory

    MainMemory

    Have no fear...Amy Rose is here! Tech Member
    4,435
    75
    28
    SonLVL
    I suppose you would have to use a listing file to determine where the art starts in the ROM, then round up to the nearest multiple of $20000. If that address is in your art file, you'll have to check the DPLCs to see if any of them do a transfer that crosses that address, and shift the alignment of the art file so that none of the DPLC entries cross that boundary. The art file itself can cross the boundary, just as long as none of the individual DPLC entries cross it.
    Or if you're using AS, you could switch the DPLCs to my macro format and add the detection code, then shift the art if it gives any warnings. It might be possible to do it with ASM68K but I'd have to go searching through the manual.
     
  4. Clownacy

    Clownacy

    Tech Member
    802
    36
    28
    Every single (S2) disasm I've applied this to has blown up. Ranging from fresh disasms, to my own hack. The Special Stage, especially. But I have had ARZ cause the VDP to melt down too.

    You can get it to trigger by applying this to a clean Git disasm, and then going to the Special Stage via level select or checkpoint. What's going on? Is this a case of something "overwriting part of the DMA queue's RAM"?
     
  5. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    Ooh, nice catch, I knew I had forgotten something. What needs to be done is this:

    Find the "SpecialStage" label and scan down to this:
    [68k] move #$2700,sr ; Mask all interrupts
    lea (VDP_control_port).l,a6
    move.w #$8B03,(a6) ; EXT-INT disabled, V scroll by screen, H scroll by line
    move.w #$8004,(a6) ; H-INT disabled
    move.w #$8ADF,(Hint_counter_reserve).w ; H-INT every 224th scanline
    move.w #$8230,(a6) ; PNT A base: $C000
    move.w #$8405,(a6) ; PNT B base: $A000
    move.w #$8C08,(a6) ; H res 32 cells, no interlace, S/H enabled
    move.w #$9003,(a6) ; Scroll table size: 128x32
    move.w #$8700,(a6) ; Background palette/color: 0/0
    move.w #$8D3F,(a6) ; H scroll table base: $FC00
    move.w #$857C,(a6) ; Sprite attribute table base: $F800
    move.w (VDP_Reg1_val).w,d0
    andi.b #$BF,d0
    move.w d0,(VDP_control_port).l[/68k]
    Add these lines after the above block:
    [68k] clr.w (VDP_Command_Buffer).w
    move.w #VDP_Command_Buffer,(VDP_Command_Buffer_Slot).w[/68k]
    Then scan further down until you find this:
    [68k] clearRAM PNT_Buffer,$C04 ; PNT buffer[/68k]
    and change it to this:
    [68k] clearRAM PNT_Buffer,$C00 ; PNT buffer[/68k]

    I missed this because these were fixed in SCH for a long, long time. An alternative fix is to skip the first change and change the latter to this:
    [68k] clearRAM PNT_Buffer,$C02 ; PNT buffer[/68k]
    This is a buggy fix, though; you may lose DMA transfers because of the queue filling.

    I updated the starting post to reflect these changes as well.
     
  6. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    Tiddles ran into an edge case that happens with Use128kbSafeDMA = 1. After poking around, I found out that the fix to the previous edge case added another edge case, which apparently is much rarer as no one else noticed. The issue is in this code:
    [68k] move.w d1,d0 ; d0 = (src_address >> 1) & $FFFF
    subq.w #1,d0 ; To guard against the case where (d0+d3)&$FFFF == 0
    ; Note: unless you modded your Genesis for 128kB of VRAM, then d3 can be at
    ; most $7FFF here in a valid call; we will assume this is the case
    add.w d3,d0 ; d0 = ((src_address >> 1) & $FFFF) + (xfer_len >> 1) - 1
    bcs.s .double_transfer ; Carry set = ($10000 << 1) = $20000, or new 128kB block[/68k]
    When the source address is exactly at the start of a 128kB boundary, the "subq.w #1,d0" will make the "add.w d3,d0" incorrectly set the carry flag, and the DMA queue will break up the DMA into a zero-length DMA* (bad) and a DMA with the remainder of the transfer.

    The fix is rather simple, and comes at no cost; it also has an additional benefit: it handles another edge case, that of a zero-length DMA*. You want to replace that bit of code with this:
    [68k] ; Note: unless you modded your Genesis for 128kB of VRAM, then d3 can be at
    ; most $7FFF here in a valid call; we will assume this is the case
    move.w d3,d0 ; d0 = length of transfer in words
    ; Compute position of last transferred word. This handles 2 cases:
    ; (1) zero length DMAs transfer length actually transfer $10000 words
    ; (2) (source+length)&$FFFF == 0
    subq.w #1,d0
    add.w d1,d0 ; d0 = ((src_address >> 1) & $FFFF) + ((xfer_len >> 1) - 1)
    bcs.s .double_transfer ; Carry set = ($10000 << 1) = $20000, or new 128kB block[/68k]
    I updated the version on the initial post with this, and added some other changes I had made locally as well (which not everyone will like, as it involves a macro).

    * = As you know, a zero-length DMA is actually a DMA with length of $10000 words, which transfers 128kB of data.
     
  7. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    Bumping again to add some constants that were needed by all non-Git-S2 disassemblies which I didn't include in the previous update. These are:
    [68k]VRAM = %100001
    CRAM = %101011
    VSRAM = %100101

    ; values for the rwd argument
    READ = %001100
    WRITE = %000111
    DMA = %100111[/68k]
    and should be defined at some point. I updated the OP with it.
     
  8. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    Quadruple post to mention a fix to another edge case.

    If you have Use128kbSafeDMA set to 1, there is one set of cases here the function won't work correctly: if transfer length is of 64kB or higher (d3 = $8000 or more). For unmodified Genesis, with the normal amount of VRAM, this is an issue only in the case of exact 64kB (d3 = $8000): in this case, the second transfer will be wrong. For Tera Drives and modified Genesis with 128kB of VRAM, the other cases also become an issue. I fixed this case by default, which makes the function slower in one case (DMA is broken in two and two pieces are correctly queued) by 4(1/0) cycles; all other cases are unmodified.

    If you want the old behavior (for example because you don't ever use a transfer of 64kB and you are not making a hack targeting machines with 128kB of VRAM), just set variable AssumeMax7FFFXfer to 1.
     
  9. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    In a record-setting quintuple post, I have an announcement and a bugfix.

    The announcement is that the improved DMA queue (and instructions for its use) can now be obtained on GitHub. I will no longer update the OP with new developments, but I will post new things in the thread.

    The bugfix: this is actually an issue only with S&K's Perform_DPLC function and 128kB-safe DMA. If you don't have the option enabled, or you are not using Perform_DPLC, then you are not affected.

    The issue is that Perform_DPLC expects the high word of d3 to be unchanged by a call to the DMA function; and this was not true in the case where the DMA was split in two if 128kB-safe option was enabled. The fix is here if you want to apply it manually.
     
  10. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    There is a glitch that can sometimes happen in real hardware due to a hardware bug. This is fixed in this commit. Thanks to djohe for reminding me of this as a result of talking with MarkeyJester.

    Edit: wow, sextuple post. Someone post a combo-breaker, please :v: