So, the other day as I was browsing through my old projects, I found a project that I had been working on but never completed, and since I came up with ways to considerably speed up level loading, I thought I could share with you. Why you'd want to use this? Well, I don't... Gotta go fast, right??!!
Bear in mind I will use Sonic 1 HiveBrain disassembly, so for SVN/Hg/Git, reference only. I am unsure whether or not will it work with Sonic 2 or Sonic 3/Knuckles/3 and Knuckles. Most of these are compatible with each other and can be applied fairly easily, however if its not the case, and is known, will be stated in the description
Method 1: Title card art optimization
Method 2: Comper compressed level graphics
Method 3: Not waiting for PLC's being loaded
Misc
If you find any bugs, or have other suggestions, post them in the comments!
Bear in mind I will use Sonic 1 HiveBrain disassembly, so for SVN/Hg/Git, reference only. I am unsure whether or not will it work with Sonic 2 or Sonic 3/Knuckles/3 and Knuckles. Most of these are compatible with each other and can be applied fairly easily, however if its not the case, and is known, will be stated in the description
Method 1: Title card art optimization
Spoiler
Extra ROM usage: 2546 (0x09F2) bytes
Extra RAM usage: 0
Optimization level: Great
So, one very long process the game does each time you load up level, is load title card art. This is nemesis compressed tiles, and are compressed without interruptions. If you were playing music before this, it would freeze, which is clearly notable in Selbi's Sonic Erazor hack. As you can hear, it takes actually pretty long time, and for not very huge save of ROM space. If we uncompress the tiles, it will be able to load in only few frames, meaning the music will not be interrupted much at all. So, it obviously is quite a good speed up considering the space usage wont be much bigger, it's good tradeoff for the speed. You will need uncompressed tile-loading code from the misc section.
You want to go to each instance of this code:
And, you want to replace it with:
You can find there ins lables loc_37B6:, loc_47D4:, and Cont_ClrObjRam:. Next, decompress artnem/ttlcards.bin. You can change the filepath of this file, rename, etc., it's up to you. However in this example I am going to keep it as is. Now, go to lable Nem_TitleCard:, and you should see something like this:
You want to change it to this:
Next, lets fix the result screens, which would cause crash. At GotThroughAct:, replace:
with:
And there you have it! Enjoy the speed!
Extra RAM usage: 0
Optimization level: Great
So, one very long process the game does each time you load up level, is load title card art. This is nemesis compressed tiles, and are compressed without interruptions. If you were playing music before this, it would freeze, which is clearly notable in Selbi's Sonic Erazor hack. As you can hear, it takes actually pretty long time, and for not very huge save of ROM space. If we uncompress the tiles, it will be able to load in only few frames, meaning the music will not be interrupted much at all. So, it obviously is quite a good speed up considering the space usage wont be much bigger, it's good tradeoff for the speed. You will need uncompressed tile-loading code from the misc section.
You want to go to each instance of this code:
move.l #$70000002,($C00004).l lea (Nem_TitleCard).l,a0 ; load title card patterns bsr.w NemDec
And, you want to replace it with:
move.l #$70000002,($C00004) ; set mode "VRAM Write to $B000" lea Nem_TitleCard,a0 ; load title card patterns move.l #((Nem_TitleCard_End-Nem_TitleCard)/32)-1,d0; the title card art lenght, in tiles jsr LoadUncArt ; load uncompressed art
You can find there ins lables loc_37B6:, loc_47D4:, and Cont_ClrObjRam:. Next, decompress artnem/ttlcards.bin. You can change the filepath of this file, rename, etc., it's up to you. However in this example I am going to keep it as is. Now, go to lable Nem_TitleCard:, and you should see something like this:
; --------------------------------------------------------------------------- ; Compressed graphics - various ; --------------------------------------------------------------------------- Nem_TitleCard: incbin artnem\ttlcards.bin ; title cards even
You want to change it to this:
; --------------------------------------------------------------------------- ; Compressed graphics - various ; --------------------------------------------------------------------------- Nem_TitleCard: incbin artnem\ttlcards.bin ; title cards Nem_TitleCard_End: even
Next, lets fix the result screens, which would cause crash. At GotThroughAct:, replace:
moveq #$10,d0 jsr (LoadPLC2).l ; load title card patterns
with:
move.l a0,-(sp) ; save object address to stack move.l #$70000002,($C00004) ; set mode "VRAM Write to $B000" lea Nem_TitleCard,a0 ; load title card patterns move.l #((Nem_TitleCard_End-Nem_TitleCard)/32)-1,d0; the title card art lenght, in tiles jsr LoadUncArt ; load uncompressed art move.l (sp)+,a0 ; get object address from stack
And there you have it! Enjoy the speed!
Method 2: Comper compressed level graphics
Spoiler
Extra ROM usage: ~ 10800 bytes
Extra RAM usage: 0
Optimization level: Good
Another long process is to load level graphics. While the processor does other things while that, it must wait for it before fading level in because of the PLC queue, and the fact that the graphics would look terrible (glitched graphics, blank space, etc.). However, because of how fast comper is, we can highly optimize the loading even if we reserve the processor completely just to load level graphics. However Comper isn't as compact as Nemesis compression, so extra space usage in inevitable. It'd be almost impossible to calculate exact space usage, so I threw an aproximation. The space usage may vary. It is good to note that currently no level editor supports Comper, so you must recompress if you wish to edit this system. You will need comper compressed tile-loading code from the misc section.
So, what you want to do first, is recompress these files from Nemesis to Comper:
artnem/8x8ghz1.bin
artnem/8x8ghz2.bin (if you have combined these files, then obviously recompress the combined file)
artnem/8x8lz.bin
artnem/8x8mz.bin
artnem/8x8sbz.bin
artnem/8x8slz.bin
artnem/8x8syz.bin
Next, go to MainLoadBlockLoad:, and replace this
with
Somewhere near MainLoadBlockLoad:, insert thise code (For not combined GHZ art files):
(for combined GHZ art files)
Next, we need to remove pointers for level art from _inc/Pattern load cues.asm. The pointers exist for all levels, and here is example how to do it for LZ. Originally you see this:
You want to change it to:
Note how I reduced the value in the first dc.w as well? This is the pointer for the amount of PLC's to load, and since we removed the main art file, that is one less. You want to repeat this for all of the levels.
Extra RAM usage: 0
Optimization level: Good
Another long process is to load level graphics. While the processor does other things while that, it must wait for it before fading level in because of the PLC queue, and the fact that the graphics would look terrible (glitched graphics, blank space, etc.). However, because of how fast comper is, we can highly optimize the loading even if we reserve the processor completely just to load level graphics. However Comper isn't as compact as Nemesis compression, so extra space usage in inevitable. It'd be almost impossible to calculate exact space usage, so I threw an aproximation. The space usage may vary. It is good to note that currently no level editor supports Comper, so you must recompress if you wish to edit this system. You will need comper compressed tile-loading code from the misc section.
So, what you want to do first, is recompress these files from Nemesis to Comper:
artnem/8x8ghz1.bin
artnem/8x8ghz2.bin (if you have combined these files, then obviously recompress the combined file)
artnem/8x8lz.bin
artnem/8x8mz.bin
artnem/8x8sbz.bin
artnem/8x8slz.bin
artnem/8x8syz.bin
Next, go to MainLoadBlockLoad:, and replace this
moveq #0,d0 move.b ($FFFFFE10).w,d0 lsl.w #4,d0 lea (MainLoadBlocks).l,a2
with
moveq #0,d0 ; quickly clear d0 move.b ($FFFFFE10).w,d0 ; get level ID bsr.s LoadLevelArt ; load level tiles lsl.w #4,d0 ; shift level ID left by 4 bits lea (MainLoadBlocks).l,a2
Somewhere near MainLoadBlockLoad:, insert thise code (For not combined GHZ art files):
; ---------------------------------------------------------------------------
; Subroutine to load level art patterns
; ---------------------------------------------------------------------------
; ||||||||||||||| S U B R O U T I N E |||||||||||||||||||||||||||||||||||||||
LoadLevelArt:
move.w d0,-(sp) ; store level ID to stack
lsl.w #2,d0 ; shift 2 bits left
move.l LLA_ArtList(pc,d0.w),a0 ; get correct entry from art file list
move.l #$40000000,d4 ; set "VRAM Write to $0000"
bsr.w LoadCompArt ; load comper compressed art
; workaround for GHZ's secondary art
cmpi.b #0,ZoneID.w ; is GHZ?
bne.s LLA_End ; if not, don't load art
lea Nem_GHZ_2nd,a0 ; get GHZ 2nd patterns
move.l #$79A00000,d4 ; set "VRAM Write to $39A0"
bsr.w LoadCompArt ; load comper compressed art
LLA_End:
move.w (sp)+,d0 ; get old level ID from stack again
rts ; return to subroutine
; list of art patterns used in levels
LLA_ArtList: dc.l Nem_GHZ_1st, Nem_LZ, Nem_MZ, Nem_SLZ, Nem_SYZ, Nem_SBZ
(for combined GHZ art files)
; --------------------------------------------------------------------------- ; Subroutine to load level art patterns ; --------------------------------------------------------------------------- ; ||||||||||||||| S U B R O U T I N E ||||||||||||||||||||||||||||||||||||||| LoadLevelArt: move.w d0,-(sp) ; store level ID to stack lsl.w #2,d0 ; shift 2 bits left move.l LLA_ArtList(pc,d0.w),a0 ; get correct entry from art file list move.l #$40000000,d4 ; set "VRAM Write to $0000" bsr.w LoadCompArt ; load comper compressed art move.w (sp)+,d0 ; get old level ID from stack again rts ; return to subroutine ; list of art patterns used in levels LLA_ArtList: dc.l Nem_GHZ, Nem_LZ, Nem_MZ, Nem_SLZ, Nem_SYZ, Nem_SBZ
Next, we need to remove pointers for level art from _inc/Pattern load cues.asm. The pointers exist for all levels, and here is example how to do it for LZ. Originally you see this:
; --------------------------------------------------------------------------- ; Pattern load cues - Labyrinth ; --------------------------------------------------------------------------- PLC_LZ: dc.w $B dc.l Nem_LZ ; LZ main patterns dc.w 0 dc.l Nem_LzBlock1 ; block dc.w $3C00 dc.l Nem_LzBlock2 ; blocks dc.w $3E00 ...
You want to change it to:
; --------------------------------------------------------------------------- ; Pattern load cues - Labyrinth ; --------------------------------------------------------------------------- PLC_LZ: dc.w $A dc.l Nem_LzBlock1 ; block dc.w $3C00 dc.l Nem_LzBlock2 ; blocks dc.w $3E00 ...
Note how I reduced the value in the first dc.w as well? This is the pointer for the amount of PLC's to load, and since we removed the main art file, that is one less. You want to repeat this for all of the levels.
Method 3: Not waiting for PLC's being loaded
Spoiler
Extra ROM usage: less than 100 bytes
Extra RAM usage: 48 (0x30) bytes
Optimization level: Great
Note: You are required to have implented Method 2: Comper compressed level graphics in order to make this work correctly.
In the original game, the level loading hangs for few seconds while it loads level graphics, such as badniks and actual level tiles. This is necessary to not make the level look broken. However as we implented level graphics being decompressed with Comper, therefore it is not an issue, and you can load other graphics much faster, for example while the title card sequence is running. This means, we don't have to wait any graphics to load before we can let the player move already, and they will never notice. However, in order to store more PLC's in the queue, you need to allocate more RAM. We will extend the PLC queue from $FFFFF680-$FFFFF6FF to $FFFFF650-$FFFFF6FF. So first of all, we need to move SBZ and LZ palette cycle pointers from $FFFFF650-$FFFFF661 to somewhere else. You need to find $11 bytes of free RAM somewhere, for this example, I will use $FFFFFECA-$FFFFFEDB.
Go to loc_19F0:, loc_1A0A:, and loc_1ADA:, and replace each instance of $FFFFF650 with your desired RAM address. In my case, $FFFFFECA.
Before StartOfROM:, place these equates:
Got to ClearPLC, and replace:
with:
Next, in loc_16DC:, replace:
with:
Above Level_ClrVars: and End_ClrRam: replace
with:
These make sure the lenght of the transfers are correct, and so PLC works as it should. Next up, we should fix the PLC addresses. Replace each instance of ($FFFFF680).w with PLCQueueAdr.w, and each instance of ($FFFFF684).w with PLCQueue.w. Now we have successfully extended PLC queue! Next, we need to make use of this extra space. So, go to Level_TtlCard:, and you should see something like this:
Replace it with this:
If some of the levels crash, you can adjust the value 3 in the second line to bigger value, until the levels don't crash. However this works completely in vanilla Sonic 1. EHowever, there is still a slight possibility that FZ can cause some issues, so lets quickly fix that. Go to Resize_FZmain:, and change:
to
Never mind the above, doing the change will make the explosion graphics break, and you can not cause any crashes in the origianl game anyway, so there is no good reason to do that chance
And there you have it!
Extra RAM usage: 48 (0x30) bytes
Optimization level: Great
Note: You are required to have implented Method 2: Comper compressed level graphics in order to make this work correctly.
In the original game, the level loading hangs for few seconds while it loads level graphics, such as badniks and actual level tiles. This is necessary to not make the level look broken. However as we implented level graphics being decompressed with Comper, therefore it is not an issue, and you can load other graphics much faster, for example while the title card sequence is running. This means, we don't have to wait any graphics to load before we can let the player move already, and they will never notice. However, in order to store more PLC's in the queue, you need to allocate more RAM. We will extend the PLC queue from $FFFFF680-$FFFFF6FF to $FFFFF650-$FFFFF6FF. So first of all, we need to move SBZ and LZ palette cycle pointers from $FFFFF650-$FFFFF661 to somewhere else. You need to find $11 bytes of free RAM somewhere, for this example, I will use $FFFFFECA-$FFFFFEDB.
Go to loc_19F0:, loc_1A0A:, and loc_1ADA:, and replace each instance of $FFFFF650 with your desired RAM address. In my case, $FFFFFECA.
Before StartOfROM:, place these equates:
PLCQueueAdr: = $FFFFF650 ; beginning of RAM allocated for PLC PLCQueue: = PLCQueueAdr+4 ; start of PLC queue PLCQueueEnd: = $FFFFF700-$20 ; end of PLC queue, start of equates for PLC, for example last state of Nemesis decompression
Got to ClearPLC, and replace:
moveq #$1F,d0
with:
moveq #(((PLCQueueEnd+$20)-PLCQueueAdr)/4)-1,d0 ; lenght of the PLC RAM
Next, in loc_16DC:, replace:
moveq #$15,d0
with:
moveq #((PLCQueueEnd-4-PLCQueue)/4)-1,d0 ; lenght of the PLC queue RAM
Above Level_ClrVars: and End_ClrRam: replace
move.w #$15,d1
with:
move.w #((PLCQueueAdr-$FFFFF628)/4)-1,d1
These make sure the lenght of the transfers are correct, and so PLC works as it should. Next up, we should fix the PLC addresses. Replace each instance of ($FFFFF680).w with PLCQueueAdr.w, and each instance of ($FFFFF684).w with PLCQueue.w. Now we have successfully extended PLC queue! Next, we need to make use of this extra space. So, go to Level_TtlCard:, and you should see something like this:
move.b #$34,($FFFFD080).w ; load title card object Level_TtlCard: move.b #$C,($FFFFF62A).w bsr.w DelayProgram jsr ObjectsLoad jsr BuildSprites bsr.w RunPLC_RAM move.w ($FFFFD108).w,d0 cmp.w ($FFFFD130).w,d0 ; has title card sequence finished? bne.s Level_TtlCard ; if not, branch tst.l ($FFFFF680).w ; are there any items in the pattern load cue? bne.s Level_TtlCard ; if yes, branch jsr Hud_Base
Replace it with this:
move.b #$34,($FFFFD080).w ; load title card object move.w #3,$FFFFFE04.w ; set the timer (Fixes Title card bug) Level_TtlCard: move.b #$C,($FFFFF62A).w ; set VBlank routine to $C (loads more tiles per VBlank thank 8 which is normally used) bsr.w DelayProgram ; wait for VBlank jsr ObjectsLoad ; run object code jsr BuildSprites ; display sprites bsr.w RunPLC_RAM ; put PLC data to RAM move.w ($FFFFD100+8).w,d0 cmp.w ($FFFFD100+$30).w,d0 ; has title card sequence finished? bne.s Level_TtlCard ; if not, branch move.w ($FFFFD0C0+8).w,d0 ; fix for FZ crash and title card issue cmp.w ($FFFFD0C0+$30).w,d0 ; has title card sequence finished? bne.s Level_TtlCard ; if not, branch subi.w #1,$FFFFFE04.w ; substract 1 from timer bne.s Level_TtlCard ; if timer is not 0, branch jsr Hud_Base
bsr.w LoadPLC ; load FZ boss patterns
to
bsr.w LoadPLC2 ; load FZ boss patterns
Never mind the above, doing the change will make the explosion graphics break, and you can not cause any crashes in the origianl game anyway, so there is no good reason to do that chance
And there you have it!
Misc
Spoiler
comper compressed tile-loading
uncompressed tile-loading
Spoiler
This is the piece of code needed for parts of this tutorial; You will be informed whenever this is necessary.
Right above LoadPLC:, put this piece of code:
If you already had CompDec routine, you can remove the old one (or the new one, they are the same anyway).
Right above LoadPLC:, put this piece of code:
; ===============================================================
; ---------------------------------------------------------------
; COMPER compressed art to VRAM loader
; ---------------------------------------------------------------
; INPUT:
; a0 - Source Offset
; d4 - VDP mode
; ---------------------------------------------------------------
LoadCompArt:
lea $FF0000.l,a1 ; get address of compdec buffer
bsr.s CompDec ; decompress art
lea $FF0000.l,a3 ; get address of compdec buffer again
lea $C00000.l,a6 ; get VDP data port
move.l a1,d0 ; move end address to d0
sub.l a3,d0 ; substract the compdec buffer address from d0
lsr.l #2,d0 ; shift 2 bits to right (as we transfer longword per loop)
subq.l #1,d0 ; substract 1 from d0 because of dbf
move #$2700,sr ; disable interrupts
move.l d4,4(a6) ; set VDP transfer mode
@loop move.l (a3)+,(a6) ; transfer next longword
dbf d0,@loop ; loop until d0 = 0
move #$2300,sr ; enable interrupts
rts
; ===============================================================
; ---------------------------------------------------------------
; COMPER Decompressor
; ---------------------------------------------------------------
; INPUT:
; a0 - Source Offset
; a1 - Destination Offset
;
; Full credits of this to Vladikcomper
; ---------------------------------------------------------------
CompDec:
@newblock
move.w (a0)+,d0 ; fetch description field
moveq #15,d3 ; set bits counter to 16
@mainloop
add.w d0,d0 ; roll description field
bcs.s @flag ; if a flag issued, branch
move.w (a0)+,(a1)+ ; otherwise, do uncompressed data
dbf d3,@mainloop ; if bits counter remains, parse the next word
bra.s @newblock ; start a new block
; ---------------------------------------------------------------
@flag moveq #-1,d1 ; init displacement
move.b (a0)+,d1 ; load displacement
add.w d1,d1
moveq #0,d2 ; init copy count
move.b (a0)+,d2 ; load copy length
beq.s @end ; if zero, branch
lea (a1,d1),a3 ; load start copy address
@loop move.w (a3)+,(a1)+ ; copy given sequence
dbf d2,@loop ; repeat
dbf d3,@mainloop ; if bits counter remains, parse the next word
bra.s @newblock ; start a new block
@end rts
If you already had CompDec routine, you can remove the old one (or the new one, they are the same anyway).
uncompressed tile-loading
Spoiler
This is the piece of code needed for parts of this tutorial; You will be informed whenever this is necessary.
Right above LoadPLC:, put this piece of code:
Right above LoadPLC:, put this piece of code:
; =============================================================== ; --------------------------------------------------------------- ; uncompressed art to VRAM loader ; --------------------------------------------------------------- ; INPUT: ; a0 - Source Offset ; d0 - length in tiles ; --------------------------------------------------------------- LoadUncArt: move #$2700,sr ; disable interrupts lea $C00000.l,a6 ; get VDP data port LoadArt_Loop: move.l (a0)+,(a6) ; transfer 4 bytes move.l (a0)+,(a6) ; transfer 4 more bytes move.l (a0)+,(a6) ; and so on and so forth move.l (a0)+,(a6) ; move.l (a0)+,(a6) ; move.l (a0)+,(a6) ; move.l (a0)+,(a6) ; in total transfer 32 bytes move.l (a0)+,(a6) ; which is 1 full tile dbf d0, LoadArt_Loop; loop until d0 = 0 move #$2300,sr ; enable interrupts rts
If you find any bugs, or have other suggestions, post them in the comments!
This post has been edited by Green Snake: 11 November 2014 - 09:28 AM


01