Here we go, my first how-to guide as a tech member. @Hivebrain's Sonic 1 Squared contains a large number of improvements to the game's engine, but one of the more noteworthy ones is its overhauled priority system. Said system, dare I say it, rivals S3K's priority manager, and may in fact be a viable alternative to it for Sonic 2, not the least because it is more directly compatible with Sonic 2's engine. I'll show you how to port it to Sonic 2 shortly, but first, let's look at how it compares to the original manager and S3K's. The original priority manager in Sonic 1 and 2, for reference, stores the priority value as a byte 0-7, which is copied as a word, left shifted, and'ed, and finally added to the start address of the sprite queue to make an offset to a priority section in the sprite queue, a process that takes 44 cycles: Code (ASM): lea (v_sprite_queue).w,a1 move.w ost_priority(a0),d0 ; get sprite priority (as high byte of a word) lsr.w #1,d0 ; d0 = priority * sizeof_priority andi.w #((1<<priority_count_bits)-1)<<priority_size_bits,d0 adda.w d0,a1 ; jump to priority section in queue S3K's priority manager speeds things up significantly by instead storing a pre-calculated offset in the OST. The priority manager simply adds this offset to the queue address, taking just 24 cycles: Code (ASM): lea (v_sprite_queue).w,a1 adda.w ost_priority(a0),a1 20 fewer cycles required for every sprite that is queued adds up to a significant speed improvement, so it's no wonder that porting this manager into Sonic 1 and 2 is so common. However, it comes with the drawback of requiring the OST structure to be rearranged, since the priority value is a full word as opposed to a byte. Hivebrain's S1SQ manager takes an approach somewhere between the two: the priority value, while still a byte, is now an 2-byte stride index value (even numbers 0-$E) that is used to fetch the address of the priority section from a lookup table, taking 30 cycles: Code (ASM): ; See below for full code of manager moveq #0,d0 move.b ost_priority(a0),d0 ; get sprite priority movea.w Disp_OffsetList(pc,d0.w),a1 ; get RAM address for priority level (movea.w sign extends to make full 32-bit address) While a bit slower than S3K's system, it's still a significant improvement over the original S1/S2 priority manager, and crucially, it does not require altering the OST structure, making it much easier to install in Sonic 2 or base Sonic 1. Installing this manager in Sonic 2 is not all that difficult; in fact, it is very close about to being a drop-in replacement in my ASM68K disassembly. Additionally, there is an optimization that can be made for those objects that use DisplaySprite3. Regardless of whether you're using SonicRetro AS or my ASM68K disasm, bugfixes MUST be enabled; the original game has a number of multisprite objects that mistakenly call DisplaySprite and pass garbage as the priority value; in the old system, this merely causes these sprites to display on random layers, but in the new system, this will crash the game. (Note that my ASM68K disasm isn't really ready for production use; the instructions for it are for the benefit of those who may use it in the future). First, let's set up our values to these equates. If you're using Sonic 2 AS, insert the following somewhere in s2.constants.asm: Code (ASM): ; Sprite priorities priority_0: equ 0 priority_1: equ 2 priority_2: equ 4 priority_3: equ 6 priority_4: equ 8 priority_5: equ $A priority_6: equ $C priority_7: equ $E If you're using Sonic 2 ASM68K, there will already be a set of equates for these in Constants.asm; simply replace those with the ones above. Now, we need to modify the code to use the new priority values. If you're using my ASM68K disasm, you may skip the first three steps of this part, as these values are already equated to constants. If you're using Sonic 2 AS (as I assume most are currently), let's go ahead and replace the raw immediates with constants. Find EVERY instance where the priority OST slot is loaded with an immediate value (be careful, as both a0 and a1 are used): Code (ASM): move.b #3,priority(a0) move.b #3,priority(a1) To all of these, add priority_ before the number: Code (ASM): move.b #priority_3,priority(a0) move.b #priority_3,priority(a1) Next, find every subObjectData macro, and do the same with the priority value there, adding priority_ before the number Code (ASM): subObjData Obj28_MapUnc_11E1C,make_art_tile(ArtTile_ArtNem_Animal_2,0,0),4,2,8,0 ; this... subObjData Obj28_MapUnc_11E1C,make_art_tile(ArtTile_ArtNem_Animal_2,0,0),4,priority_2,8,0 ; becomes this Do the same with the OST data lists at Obj1C_InitData and Obj71_InitData... Code (ASM): objsubdecl 0, Obj1C_MapUnc_11552, make_art_tile(ArtTile_ArtNem_BoltEnd_Rope,2,0), 4, 6 ; this... objsubdecl 0, Obj1C_MapUnc_11552, make_art_tile(ArtTile_ArtNem_BoltEnd_Rope,2,0), 4, priority_6 ; becomes this ...and for the prison capsule's OST data at Obj3E_ObjLoadData: Code (ASM): dc.b 0, 2,$20, 4, 0 ; this... dc.b 0, 2,$20, priority_4, 0 ; becomes this Now, we need to deal with all of the multi-sprite objects that use DisplaySprite3. Due to the way the multi-sprite system works, these objects cannot use the priority OST slot; instead, they pass a pre-calculated priority section offset to DisplaySprite3, which skips the calculation steps. Rather than keep this, we'll change it to pass the same index values as normal objects, which will also allow for a nice optimization. Near every call to DisplaySprite3 is a move.w instruction that places a priority section offset in d0: Code (ASM): ; Sonic 2 AS move.w #object_display_list_size*2,d0 jmp (DisplaySprite3).l ; ===================== ; Sonic 2 ASM68K move.w #sizeof_priority*priority_2,d0 jmp (DisplaySprite3).l Find all 22 of these, including the ones in bugfixes, and change them to this: Code (ASM): ; Sonic 2 AS moveq #priority_2,d0 jmp (DisplaySprite3).l ; ===================== ; Sonic 2 ASM68K moveq #priority_2,d0 jmp (DisplaySprite3).l Since the priority value passed to DisplaySprite3 is now a positive byte, we can optimize all of these move.w instructions to moveq. Finally, there are two objects which calculate a priority value for an object they are spawning. Special Stage Tails (Object 10) sets the priority for his tails under loc_36864 thusly: Code (ASM): ; Sonic 2 AS move.b priority(a0),priority(a1) subi_.b #1,priority(a1) ; ===================== ; Sonic 2 ASM68K move.b ost_priority(a0),ost_priority(a1) subi_.b #1,ost_priority(a1) Change the 'subi_.b #1' to a 'subq #2'. The WFZ breakable plating (Object C1) does nearly the same under loc_3C1F4: Code (ASM): ; Sonic 2 AS move.b priority(a0),d4 subq.b #1,d4 ; ===================== ; Sonic 2 ASM68K move.b ost_priority(a0),d4 subq.b #1,d4 Again, change the 'subq.b #1' to a 'subq #2'. At last, we are ready to install the new manager. Find the DisplaySprite routines, and replace all three of them with this: Spoiler: Sonic 2 AS Code (ASM): DisplaySprite: moveq #0,d0 move.b priority(a0),d0 ; get sprite priority DisplaySprite3: movea.w Disp_OffsetList(pc,d0.w),a1 ; get RAM address for priority level move.w (a1),d0 cmpi.w #object_display_list_size-2,d0 ; is this section full? ($7E) bcc.s .full ; if yes, branch addq.w #2,d0 ; increment sprite count move.w d0,a1 ; jump to empty position move.w a0,(a1,d0.w) ; insert RAM address for OST of object .full: rts ; =========================================================================== Disp_OffsetList: dc.w Object_Display_Lists dc.w Object_Display_Lists+object_display_list_size dc.w Object_Display_Lists+(object_display_list_size*2) dc.w Object_Display_Lists+(object_display_list_size*3) dc.w Object_Display_Lists+(object_display_list_size*4) dc.w Object_Display_Lists+(object_display_list_size*5) dc.w Object_Display_Lists+(object_display_list_size*6) dc.w Object_Display_Lists+(object_display_list_size*7) if (*-Disp_OffsetList)/2<>total_object_display_lists fatal "Mismatch between DisplaySprite and total_object_display_lists." endif ; =========================================================================== DisplaySprite2: moveq #0,d0 move.b priority(a1),d0 ; get sprite priority movea.w Disp_OffsetList(pc,d0.w),a2 ; get RAM address for priority level move.w (a2),d0 cmpi.w #object_display_list_size-2,d0 ; is this section full? ($7E) bcc.s .full ; if yes, branch addq.w #2,d0 ; increment sprite count move.w d0,a2 ; jump to empty position move.w a1,(a2,d0.w) ; insert RAM address for OST of object .full: rts Spoiler: Sonic 2 ASM68K Code (ASM): ; --------------------------------------------------------------------------- ; Subroutine to add an object to the sprite queue for display by BuildSprites ; DisplaySprite3 is used by multi-sprite objects ; input: ; a0 = address of OST for object ; d0.b = index to priority section (DisplaySprite3 only) ; uses d0.w, a1 ; --------------------------------------------------------------------------- DisplaySprite: moveq #0,d0 move.b ost_priority(a0),d0 ; get sprite priority DisplaySprite3: movea.w Disp_OffsetList(pc,d0.w),a1 ; get RAM address for priority level move.w (a1),d0 cmpi.w #sizeof_priority-2,d0 ; is this section full? ($7E) bcc.s .full ; if yes, branch addq.w #2,d0 ; increment sprite count move.w d0,(a1) move.w a0,(a1,d0.w) ; insert RAM address for OST of object .full: rts Disp_OffsetList: dc.w v_sprite_queue dc.w v_sprite_queue+sizeof_priority dc.w v_sprite_queue+(sizeof_priority*2) dc.w v_sprite_queue+(sizeof_priority*3) dc.w v_sprite_queue+(sizeof_priority*4) dc.w v_sprite_queue+(sizeof_priority*5) dc.w v_sprite_queue+(sizeof_priority*6) dc.w v_sprite_queue+(sizeof_priority*7) if ((*-Disp_OffsetList)/2)<>countof_priority inform 3,"Mismatch between DisplaySprite and countof_priority." endc ; --------------------------------------------------------------------------- ; Subroutine to add a child object to the sprite queue ; ; input: ; a1 = address of OST for object ; uses d0.w, a2 ; --------------------------------------------------------------------------- DisplaySprite2: moveq #0,d0 move.b ost_priority(a1),d0 ; get sprite priority movea.w Disp_OffsetList(pc,d0.w),a2 ; get RAM address for priority level move.w (a2),d0 cmpi.w #sizeof_priority-2,d0 ; is this section full? ($7E) bcc.s .full ; if yes, branch addq.w #2,d0 ; increment sprite count move.w d0,(a2) move.w a1,(a2,d0.w) ; insert RAM address for OST of object .full: rts That should more or less be it. You now have a priority manager that is 16 cycles faster and 52 bytes smaller (the new routine only uses 72 bytes as opposed to 80, and the moveq optimizations free up another 44), all without any RAM modifications! Credit me if you use it, report any issues you discover, and of course, happy hacking!
It should be noted that this seems to have come from a thread on Sonic Stuff Research Group: https://sonicresearch.org/community...-way-of-optimizing-s1-s2s-displaysprite.5646/
Funnily enough, it didn't. It is shockingly similar though. There must only be one good way to optimise that subroutine. That post is from 2019 so of course they deserve all the credit.
You're right. There are minor differences in how they are set up, though the overall idea is very similar. Funnily enough, this is what DisplaySprite (or actionsub) looked like in the Sonic 1 prototype: Code (Text): ObjectDisplay: lea (DisplayLists).w,a1 move.b prio(a0),d0 andi.w #7,d0 lsl.w #7,d0 adda.w d0,a1 cmpi.w #$7E,(a1) bcc.s locret_8768 addq.w #2,(a1) adda.w (a1),a1 move.w a0,(a1) locret_8768: rts Note that at that point in time, the priority SST value (prio, priority, ost_priority, obPriority, sprpri) was at byte $19, not $18. It switched places with the sprite width in dots value (xdisp, width_pixels, ost_displaywidth, obActWid, sprhs) later on. (Names in parentheses come from, in order: the Sonic 1 prototype disassembly I linked to above, the Sonic 2 AS disassembly, Hivebrain's 2021/2022 Sonic 1 disassembly (and OrionNavattan's Sonic 2 ASM68K), Sonic Retro's Sonic 1 disassembly, and Sonic CD as seen in Sonic Gems Collection.)
Funnily I have a similar version of that routine, with the only real change being that the priorities aren't pre-multiplied by 2, which adds 4 cycles compared to this one. It was made by Hame, and given better register use (and also adapted to Sonic 2) by me. It's a pretty easy speedup if you don't want to change every single objects priority data, though for Sonic 2 you will need to change objects that use DisplaySprite3. Not important for Sonic 1. The better register use is just useful in general make_priority is Object_Display_Lists+(prid*object_display_list_size), object_display_list_size is $80 btw Code (Text): ; --------------------------------------------------------------------------- ; Subroutine to display a sprite/object, when a0 is the object RAM ; --------------------------------------------------------------------------- ; made by Hame and Malachi ; sub_164F4: DisplaySprite: clr.w d0 move.b priority(a0),d0 add.w d0,d0 movea.w DisplaySprite_PriorityLUT(pc,d0.w),a1 DisplaySprite3: move.w (a1),d0 ; get the amount of objects in the queue cmp.w #object_display_list_size-2,d0 ; is it full? bhs.s .return ; if so, branch addq.w #2,d0 ; add to the queue amount move.w d0,(a1) ; copy the addition to the queue amount move.w a0,(a1,d0.w) ; copy the objects address to the queue .return: rts ; End of function DisplaySprite DisplaySprite_PriorityLUT: dc.w make_priority(0) dc.w make_priority(1) dc.w make_priority(2) dc.w make_priority(3) dc.w make_priority(4) dc.w make_priority(5) dc.w make_priority(6) dc.w make_priority(7) ; --------------------------------------------------------------------------- ; Subroutine to display a sprite/object, when a1 is the object RAM ; --------------------------------------------------------------------------- ; sub_16512: DisplaySprite2: clr.w d0 move.b priority(a1),d0 add.w d0,d0 movea.w DisplaySprite_PriorityLUT(pc,d0.w),a2 move.w (a2),d0 ; get the amount of objects in the queue cmp.w #object_display_list_size-2,d0 ; is it full? bhs.s .return ; if so, branch addq.w #2,d0 ; add to the queue amount move.w d0,(a2) ; copy the addition to the queue amount move.w a1,(a2,d0.w) ; copy the objects address to the queue .return: rts ; End of function DisplaySprite2 ; --------------------------------------------------------------------------- ; Subroutine to display a sprite/object, when a0 is the object RAM ; and a1 is already make_priority(prid) ; --------------------------------------------------------------------------- ; loc_16530: ;DisplaySprite3: ; move.w (a1),d0 ; get the amount of objects in the queue ; cmp.w #object_display_list_size-2,d0 ; is it full? ; bhs.s .return ; if so, branch ; addq.w #2,d0 ; add to the queue amount ; move.w d0,(a1) ; copy the addition to the queue amount ; move.w a0,(a1,d0.w) ; copy the objects address to the queue ;.return: ; rts
That saved me 2 cycles. Every little helps, especially for code running 10 or 15 times a frame. I don't know about Sonic 2, but in Sonic 1 priority level 7 isn't used. Removing it saves $80 bytes RAM and some CPU time in BuildSprites.
Sonic 2 uses all seven priority levels, so unfortunately it still needs the full sprite queue. RealMalachi's optimization still applies regardless. (Taking a nod from RISC architectures and loading values into a register to perform consecutive operations before writing back to memory always saves a few cycles since we don't have to memory reads/writes for each operation.)
I think you could potentially get away with removing priority 7 in Sonic 2, as only three objects ever use it (the big pylon in CPZ, the CNZ boss, and the orbs in MTZ's boss).
Heh, thanks. And yeah you can definitely get away with removing priority 7, MTZ boss orbs might need some finetuning but the others should just work
Updated with @RealMalachi's optimization and fixed installation instructions (I missed an additional object that load priority values from an array and two others that calculate another object's priority from their own).