Sonic and Sega Retro Message Board: Optimizing the DMA queue - Sonic and Sega Retro Message Board

Jump to content

Hey there, Guest!  (Log In · Register) Help
  • 2 Pages +
  • 1
  • 2
    Locked
    Locked Forum

Optimizing the DMA queue Now on GitHub

#16 User is offline flamewing 

Posted 17 August 2014 - 01:55 PM

  • Emerald Hunter
  • Posts: 831
  • Joined: 11-October 10
  • Gender:Male
  • Location:Brasil
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
This is happening then because Tails' tails is overwriting part of the DMA queue's RAM; my version is so much faster because it assumes this does not happen. If you want, you can send me a ROM and I will tell you exactly where the error is.

#17 User is offline KingofHarts 

Posted 17 August 2014 - 03:05 PM

  • Call me back when people stop shitting in the punch bowl...
  • Posts: 1480
  • Joined: 07-August 10
  • Gender:Male
  • Wiki edits:1
EDITED POST

Posted the rom, looking forward to hearing back
This post has been edited by KingofHarts: 20 August 2014 - 10:39 AM

#18 User is offline flamewing 

Posted 01 September 2014 - 06:42 PM

  • Emerald Hunter
  • Posts: 831
  • Joined: 11-October 10
  • Gender:Male
  • Location:Brasil
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
After debugging KingofHarts' issue, I found out that it was caused by a slight oversight of mine in an edge case: when deciding whether or not to break up a DMA transfer that crosses a 128kB block, I was actually checking whether the last word transfered would be on a new 128kB block or not. I updated the first post with the new version. This caused the 128kB-safe version to become slightly slower; the new times for this version are:
  • 48(11/0) cycles if the queue was full at the start (as always) [unchanged];
  • 214(37/9) cycles for DMA transfers that do not need to be split into two [increased by 4(1/0)];
  • 252(46/9) cycles if the first piece of the DMA filled the queue [increased by 8(2/0)];
  • 364(63/16) cycles if both pieces of the DMA were queued [increased by 8(2/0)].

The non-128kB-safe version remains the same speed, which is all the more reason to just align the art in ROM to avoid the issue altogether.

#19 User is offline MainMemory 

Posted 01 September 2014 - 07:00 PM

  • Every day's the same old thing... Same place, different day...
  • Posts: 3369
  • Joined: 14-August 09
  • Gender:Not Telling
  • Project:SonLVL
  • Wiki edits:1,339
It may be worth mentioning that there is a way to automatically detect overflows in DPLCs using my sprite mappings macros with AS. Add these lines just before the endm in the dplcEntry macro:
	if dplcTiles <> 0
	if ((dplcTiles+(offset*$20))/131072) <> ((dplcTiles+(offset*$20)+(tiles*$20)-1)/131072)
	message	"Warning: DPLC crosses 128K boundary! line: \{MOMLINE/1.0} start: offset count: tiles overflow: $\{(dplcTiles+(offset*$20)+(tiles*$20))#131072}"
	endif
	endif

Also make sure to put "dplcTiles := 0" above the macro to initialize it. Then you can put "dplcTiles := ArtUnc_Sonic" at the top of your DPLC file and "dplcTiles := 0" at the end, and if any DPLCs cross a 128K boundary you'll get a message like "Warning: DPLC crosses 128K boundary! line: 705 start: $7C8 count: $10 overflow: $60".
This post has been edited by MainMemory: 11 September 2014 - 02:06 PM

#20 User is offline KingofHarts 

Posted 01 September 2014 - 11:17 PM

  • Call me back when people stop shitting in the punch bowl...
  • Posts: 1480
  • Joined: 07-August 10
  • Gender:Male
  • Wiki edits:1

View Postflamewing, on 01 September 2014 - 06:42 PM, said:

After debugging KingofHarts' issue, I found out that it was caused by a slight oversight of mine in an edge case: when deciding whether or not to break up a DMA transfer that crosses a 128kB block, I was actually checking whether the last word transfered would be on a new 128kB block or not. I updated the first post with the new version. This caused the 128kB-safe version to become slightly slower; the new times for this version are:
  • 48(11/0) cycles if the queue was full at the start (as always) [unchanged];
  • 214(37/9) cycles for DMA transfers that do not need to be split into two [increased by 4(1/0)];
  • 252(46/9) cycles if the first piece of the DMA filled the queue [increased by 8(2/0)];
  • 364(63/16) cycles if both pieces of the DMA were queued [increased by 8(2/0)].

The non-128kB-safe version remains the same speed, which is all the more reason to just align the art in ROM to avoid the issue altogether.


With 6 characters each having their own art, I have to take the slower route. Aligning them ALL would be brutal on ROM size. Thank you for this though, regardless. :D

#21 User is offline FraGag 

Posted 02 September 2014 - 12:21 AM

  • Posts: 659
  • Joined: 09-January 08
  • Gender:Male
  • Location:Québec, Canada
  • Project:an assembler
  • Wiki edits:6

View PostKingofHarts, on 01 September 2014 - 11:17 PM, said:

With 6 characters each having their own art, I have to take the slower route. Aligning them ALL would be brutal on ROM size. Thank you for this though, regardless. :D

The art doesn't need to start on a 128KB "bank"; it must simply not cross a 128KB boundary. If you have space left in a bank, you can move other, smaller data to the bank to fill it, instead of wasting the space with padding. However, if you're still developing your hack and the data is subject to change in size, you may want to do this later, otherwise you may have to reorganize the data repeatedly (and in the meantime, use the 128KB-safe version).

#22 User is offline KingofHarts 

Posted 02 September 2014 - 10:03 AM

  • Call me back when people stop shitting in the punch bowl...
  • Posts: 1480
  • Joined: 07-August 10
  • Gender:Male
  • Wiki edits:1
Good to know... how would I know where the first "bank" starts, exactly?

#23 User is offline MainMemory 

Posted 02 September 2014 - 10:24 AM

  • Every day's the same old thing... Same place, different day...
  • Posts: 3369
  • Joined: 14-August 09
  • Gender:Not Telling
  • Project:SonLVL
  • Wiki edits:1,339
I suppose you would have to use a listing file to determine where the art starts in the ROM, then round up to the nearest multiple of $20000. If that address is in your art file, you'll have to check the DPLCs to see if any of them do a transfer that crosses that address, and shift the alignment of the art file so that none of the DPLC entries cross that boundary. The art file itself can cross the boundary, just as long as none of the individual DPLC entries cross it.
Or if you're using AS, you could switch the DPLCs to my macro format and add the detection code, then shift the art if it gives any warnings. It might be possible to do it with ASM68K but I'd have to go searching through the manual.

#24 User is online Clownacy 

Posted 26 September 2014 - 11:18 AM

  • Needs to make an avatar
  • Posts: 311
  • Joined: 06-July 13
  • Gender:Male
  • Location:Englandland
Every single (S2) disasm I've applied this to has blown up. Ranging from fresh disasms, to my own hack. The Special Stage, especially. But I have had ARZ cause the VDP to melt down too.

You can get it to trigger by applying this to a clean Git disasm, and then going to the Special Stage via level select or checkpoint. What's going on? Is this a case of something "overwriting part of the DMA queue's RAM"?

#25 User is offline flamewing 

Posted 26 September 2014 - 12:56 PM

  • Emerald Hunter
  • Posts: 831
  • Joined: 11-October 10
  • Gender:Male
  • Location:Brasil
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
Ooh, nice catch, I knew I had forgotten something. What needs to be done is this:

Find the "SpecialStage" label and scan down to this:
	move	#$2700,sr		; Mask all interrupts
	lea	(VDP_control_port).l,a6
	move.w	#$8B03,(a6)		; EXT-INT disabled, V scroll by screen, H scroll by line
	move.w	#$8004,(a6)		; H-INT disabled
	move.w	#$8ADF,(Hint_counter_reserve).w	; H-INT every 224th scanline
	move.w	#$8230,(a6)		; PNT A base: $C000
	move.w	#$8405,(a6)		; PNT B base: $A000
	move.w	#$8C08,(a6)		; H res 32 cells, no interlace, S/H enabled
	move.w	#$9003,(a6)		; Scroll table size: 128x32
	move.w	#$8700,(a6)		; Background palette/color: 0/0
	move.w	#$8D3F,(a6)		; H scroll table base: $FC00
	move.w	#$857C,(a6)		; Sprite attribute table base: $F800
	move.w	(VDP_Reg1_val).w,d0
	andi.b	#$BF,d0
	move.w	d0,(VDP_control_port).l

Add these lines after the above block:
	clr.w	(VDP_Command_Buffer).w
	move.w	#VDP_Command_Buffer,(VDP_Command_Buffer_Slot).w

Then scan further down until you find this:
	clearRAM PNT_Buffer,$C04	; PNT buffer

and change it to this:
	clearRAM PNT_Buffer,$C00	; PNT buffer


I missed this because these were fixed in SCH for a long, long time. An alternative fix is to skip the first change and change the latter to this:
	clearRAM PNT_Buffer,$C02	; PNT buffer

This is a buggy fix, though; you may lose DMA transfers because of the queue filling.

I updated the starting post to reflect these changes as well.

#26 User is offline flamewing 

Posted 02 February 2015 - 07:35 PM

  • Emerald Hunter
  • Posts: 831
  • Joined: 11-October 10
  • Gender:Male
  • Location:Brasil
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
Tiddles ran into an edge case that happens with Use128kbSafeDMA = 1. After poking around, I found out that the fix to the previous edge case added another edge case, which apparently is much rarer as no one else noticed. The issue is in this code:
        move.w  d1,d0                                                   ; d0 = (src_address >> 1) & $FFFF
        subq.w  #1,d0                                                   ; To guard against the case where (d0+d3)&$FFFF == 0
        ; Note: unless you modded your Genesis for 128kB of VRAM, then d3 can be at
        ; most $7FFF here in a valid call; we will assume this is the case
        add.w   d3,d0                                                   ; d0 = ((src_address >> 1) & $FFFF) + (xfer_len >> 1) - 1
        bcs.s   .double_transfer                                ; Carry set = ($10000 << 1) = $20000, or new 128kB block

When the source address is exactly at the start of a 128kB boundary, the "subq.w #1,d0" will make the "add.w d3,d0" incorrectly set the carry flag, and the DMA queue will break up the DMA into a zero-length DMA* (bad) and a DMA with the remainder of the transfer.

The fix is rather simple, and comes at no cost; it also has an additional benefit: it handles another edge case, that of a zero-length DMA*. You want to replace that bit of code with this:
        ; Note: unless you modded your Genesis for 128kB of VRAM, then d3 can be at
        ; most $7FFF here in a valid call; we will assume this is the case
        move.w  d3,d0                                                   ; d0 = length of transfer in words
        ; Compute position of last transferred word. This handles 2 cases:
        ; (1) zero length DMAs transfer length actually transfer $10000 words
        ; (2) (source+length)&$FFFF == 0
        subq.w  #1,d0
        add.w   d1,d0                                                   ; d0 = ((src_address >> 1) & $FFFF) + ((xfer_len >> 1) - 1)
        bcs.s   .double_transfer                                ; Carry set = ($10000 << 1) = $20000, or new 128kB block

I updated the version on the initial post with this, and added some other changes I had made locally as well (which not everyone will like, as it involves a macro).

* = As you know, a zero-length DMA is actually a DMA with length of $10000 words, which transfers 128kB of data.

#27 User is offline flamewing 

Posted 17 March 2015 - 05:23 PM

  • Emerald Hunter
  • Posts: 831
  • Joined: 11-October 10
  • Gender:Male
  • Location:Brasil
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
Bumping again to add some constants that were needed by all non-Git-S2 disassemblies which I didn't include in the previous update. These are:
VRAM = %100001
CRAM = %101011
VSRAM = %100101

; values for the rwd argument
READ = %001100
WRITE = %000111
DMA = %100111

and should be defined at some point. I updated the OP with it.

#28 User is offline flamewing 

Posted 18 March 2015 - 02:44 PM

  • Emerald Hunter
  • Posts: 831
  • Joined: 11-October 10
  • Gender:Male
  • Location:Brasil
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
Quadruple post to mention a fix to another edge case.

If you have Use128kbSafeDMA set to 1, there is one set of cases here the function won't work correctly: if transfer length is of 64kB or higher (d3 = $8000 or more). For unmodified Genesis, with the normal amount of VRAM, this is an issue only in the case of exact 64kB (d3 = $8000): in this case, the second transfer will be wrong. For Tera Drives and modified Genesis with 128kB of VRAM, the other cases also become an issue. I fixed this case by default, which makes the function slower in one case (DMA is broken in two and two pieces are correctly queued) by 4(1/0) cycles; all other cases are unmodified.

If you want the old behavior (for example because you don't ever use a transfer of 64kB and you are not making a hack targeting machines with 128kB of VRAM), just set variable AssumeMax7FFFXfer to 1.

#29 User is offline flamewing 

Posted 12 July 2015 - 10:36 AM

  • Emerald Hunter
  • Posts: 831
  • Joined: 11-October 10
  • Gender:Male
  • Location:Brasil
  • Project:Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
  • Wiki edits:12
In a record-setting quintuple post, I have an announcement and a bugfix.

The announcement is that the improved DMA queue (and instructions for its use) can now be obtained on GitHub. I will no longer update the OP with new developments, but I will post new things in the thread.

The bugfix: this is actually an issue only with S&K's Perform_DPLC function and 128kB-safe DMA. If you don't have the option enabled, or you are not using Perform_DPLC, then you are not affected.

The issue is that Perform_DPLC expects the high word of d3 to be unchanged by a call to the DMA function; and this was not true in the case where the DMA was split in two if 128kB-safe option was enabled. The fix is here if you want to apply it manually.

  • 2 Pages +
  • 1
  • 2
    Locked
    Locked Forum

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users