don't click here

ASM Porting Sonic 1 Squared's Priority Manager to Sonic 2

Discussion in 'Engineering & Reverse Engineering' started by OrionNavattan, Jan 17, 2024.

  1. OrionNavattan

    OrionNavattan

    Tech Member
    167
    165
    43
    Oregon
    Here we go, my first how-to guide as a tech member.

    @Hivebrain's Sonic 1 Squared contains a large number of improvements to the game's engine, but one of the more noteworthy ones is its overhauled priority system. Said system, dare I say it, rivals S3K's priority manager, and may in fact be a viable alternative to it for Sonic 2, not the least because it is more directly compatible with Sonic 2's engine. I'll show you how to port it to Sonic 2 shortly, but first, let's look at how it compares to the original manager and S3K's.

    The original priority manager in Sonic 1 and 2, for reference, stores the priority value as a byte 0-7, which is copied as a word, left shifted, and'ed, and finally added to the start address of the sprite queue to make an offset to a priority section in the sprite queue, a process that takes 44 cycles:
    Code (ASM):
    1.    lea (v_sprite_queue).w,a1
    2.    move.w ost_priority(a0),d0 ; get sprite priority (as high byte of a word)
    3.    lsr.w #1,d0 ; d0 = priority * sizeof_priority
    4.    andi.w #((1<<priority_count_bits)-1)<<priority_size_bits,d0
    5.    adda.w d0,a1 ; jump to priority section in queue

    S3K's priority manager speeds things up significantly by instead storing a pre-calculated offset in the OST. The priority manager simply adds this offset to the queue address, taking just 24 cycles:
    Code (ASM):
    1.     lea (v_sprite_queue).w,a1
    2.     adda.w ost_priority(a0),a1
    20 fewer cycles required for every sprite that is queued adds up to a significant speed improvement, so it's no wonder that porting this manager into Sonic 1 and 2 is so common. However, it comes with the drawback of requiring the OST structure to be rearranged, since the priority value is a full word as opposed to a byte.


    Hivebrain's S1SQ manager takes an approach somewhere between the two: the priority value, while still a byte, is now an 2-byte stride index value (even numbers 0-$E) that is used to fetch the address of the priority section from a lookup table, taking 30 cycles:
    Code (ASM):
    1.        ; See below for full code of manager
    2.        moveq   #0,d0
    3.        move.b   ost_priority(a0),d0           ; get sprite priority
    4.        movea.w   Disp_OffsetList(pc,d0.w),a1       ; get RAM address for priority level (movea.w sign extends to make full 32-bit address)
    While a bit slower than S3K's system, it's still a significant improvement over the original S1/S2 priority manager, and crucially, it does not require altering the OST structure, making it much easier to install in Sonic 2 or base Sonic 1.


    Installing this manager in Sonic 2 is not all that difficult; in fact, it is very close about to being a drop-in replacement in my ASM68K disassembly. Additionally, there is an optimization that can be made for those objects that use DisplaySprite3. Regardless of whether you're using SonicRetro AS or my ASM68K disasm, bugfixes MUST be enabled; the original game has a number of multisprite objects that mistakenly call DisplaySprite and pass garbage as the priority value; in the old system, this merely causes these sprites to display on random layers, but in the new system, this will crash the game. (Note that my ASM68K disasm isn't really ready for production use; the instructions for it are for the benefit of those who may use it in the future).

    First, let's set up our values to these equates. If you're using Sonic 2 AS, insert the following somewhere in s2.constants.asm:
    Code (ASM):
    1. ; Sprite priorities
    2. priority_0:       equ 0
    3. priority_1:       equ 2
    4. priority_2:       equ 4
    5. priority_3:       equ 6
    6. priority_4:       equ 8
    7. priority_5:       equ $A
    8. priority_6:       equ $C
    9. priority_7:       equ $E
    If you're using Sonic 2 ASM68K, there will already be a set of equates for these in Constants.asm; simply replace those with the ones above.

    Now, we need to modify the code to use the new priority values. If you're using my ASM68K disasm, you may skip the first three steps of this part, as these values are already equated to constants. If you're using Sonic 2 AS (as I assume most are currently), let's go ahead and replace the raw immediates with constants.

    Find EVERY instance where the priority OST slot is loaded with an immediate value (be careful, as both a0 and a1 are used):
    Code (ASM):
    1.  move.b #3,priority(a0)
    2.  move.b #3,priority(a1)

    To all of these, add priority_ before the number:
    Code (ASM):
    1.  move.b #priority_3,priority(a0)
    2.  move.b #priority_3,priority(a1)

    Next, find every subObjectData macro, and do the same with the priority value there, adding priority_ before the number
    Code (ASM):
    1.   subObjData Obj28_MapUnc_11E1C,make_art_tile(ArtTile_ArtNem_Animal_2,0,0),4,2,8,0 ; this...
    2.   subObjData Obj28_MapUnc_11E1C,make_art_tile(ArtTile_ArtNem_Animal_2,0,0),4,priority_2,8,0 ; becomes this

    Do the same with the OST data lists at Obj1C_InitData and Obj71_InitData...
    Code (ASM):
    1.   objsubdecl 0, Obj1C_MapUnc_11552, make_art_tile(ArtTile_ArtNem_BoltEnd_Rope,2,0), 4, 6 ; this...
    2.   objsubdecl 0, Obj1C_MapUnc_11552, make_art_tile(ArtTile_ArtNem_BoltEnd_Rope,2,0), 4, priority_6 ; becomes this

    ...and for the prison capsule's OST data at Obj3E_ObjLoadData:
    Code (ASM):
    1.    dc.b   0,  2,$20,  4,  0 ; this...
    2.    dc.b   0,  2,$20,  priority_4,  0 ; becomes this

    Now, we need to deal with all of the multi-sprite objects that use DisplaySprite3. Due to the way the multi-sprite system works, these objects cannot use the priority OST slot; instead, they pass a pre-calculated priority section offset to DisplaySprite3, which skips the calculation steps. Rather than keep this, we'll change it to pass the same index values as normal objects, which will also allow for a nice optimization.

    Near every call to DisplaySprite3 is a move.w instruction that places a priority section offset in d0:
    Code (ASM):
    1. ; Sonic 2 AS
    2.   move.w   #object_display_list_size*2,d0
    3.   jmp   (DisplaySprite3).l
    4. ; =====================
    5.    ; Sonic 2 ASM68K
    6.     move.w   #sizeof_priority*priority_2,d0
    7.     jmp   (DisplaySprite3).l

    Find all 22 of these, including the ones in bugfixes, and change them to this:
    Code (ASM):
    1.   ; Sonic 2 AS
    2.   moveq   #priority_2,d0
    3.   jmp   (DisplaySprite3).l
    4. ; =====================
    5.    ; Sonic 2 ASM68K
    6.     moveq  #priority_2,d0
    7.     jmp   (DisplaySprite3).l
    Since the priority value passed to DisplaySprite3 is now a positive byte, we can optimize all of these move.w instructions to moveq.

    Finally, there are two objects which calculate a priority value for an object they are spawning. Special Stage Tails (Object 10) sets the priority for his tails under loc_36864 thusly:
    Code (ASM):
    1.  ; Sonic 2 AS
    2.     move.b   priority(a0),priority(a1)
    3.     subi_.b   #1,priority(a1)
    4. ; =====================
    5.    ; Sonic 2 ASM68K
    6.     move.b   ost_priority(a0),ost_priority(a1)
    7.     subi_.b   #1,ost_priority(a1)
    Change the 'subi_.b #1' to a 'subq #2'.

    The WFZ breakable plating (Object C1) does nearly the same under loc_3C1F4:
    Code (ASM):
    1. ; Sonic 2 AS
    2.     move.b   priority(a0),d4
    3.     subq.b   #1,d4
    4. ; =====================
    5.    ; Sonic 2 ASM68K
    6.        move.b   ost_priority(a0),d4
    7.        subq.b   #1,d4
    8.  
    Again, change the 'subq.b #1' to a 'subq #2'.


    At last, we are ready to install the new manager. Find the DisplaySprite routines, and replace all three of them with this:

    Code (ASM):
    1. DisplaySprite:
    2.    moveq   #0,d0
    3.    move.b   priority(a0),d0           ; get sprite priority
    4.  
    5. DisplaySprite3:
    6.    movea.w   Disp_OffsetList(pc,d0.w),a1       ; get RAM address for priority level
    7.    move.w   (a1),d0
    8.    cmpi.w   #object_display_list_size-2,d0           ; is this section full? ($7E)
    9.    bcc.s   .full                   ; if yes, branch
    10.    addq.w   #2,d0                   ; increment sprite count
    11.    move.w   d0,a1                   ; jump to empty position
    12.    move.w   a0,(a1,d0.w)                   ; insert RAM address for OST of object
    13.  
    14. .full:
    15.    rts
    16. ; ===========================================================================
    17.  
    18. Disp_OffsetList:
    19.    dc.w Object_Display_Lists
    20.    dc.w Object_Display_Lists+object_display_list_size
    21.    dc.w Object_Display_Lists+(object_display_list_size*2)
    22.    dc.w Object_Display_Lists+(object_display_list_size*3)
    23.    dc.w Object_Display_Lists+(object_display_list_size*4)
    24.    dc.w Object_Display_Lists+(object_display_list_size*5)
    25.    dc.w Object_Display_Lists+(object_display_list_size*6)
    26.    dc.w Object_Display_Lists+(object_display_list_size*7)
    27.    if (*-Disp_OffsetList)/2<>total_object_display_lists
    28.    fatal "Mismatch between DisplaySprite and total_object_display_lists."
    29.    endif
    30. ; ===========================================================================
    31.  
    32. DisplaySprite2:
    33.    moveq   #0,d0
    34.    move.b   priority(a1),d0           ; get sprite priority
    35.    movea.w   Disp_OffsetList(pc,d0.w),a2       ; get RAM address for priority level
    36.    move.w   (a2),d0
    37.    cmpi.w   #object_display_list_size-2,d0           ; is this section full? ($7E)
    38.    bcc.s   .full                   ; if yes, branch
    39.    addq.w   #2,d0                   ; increment sprite count
    40.    move.w   d0,a2                   ; jump to empty position
    41.    move.w   a1,(a2,d0.w)                   ; insert RAM address for OST of object
    42.  
    43. .full:
    44.    rts

    Code (ASM):
    1. ; ---------------------------------------------------------------------------
    2. ; Subroutine to add an object to the sprite queue for display by BuildSprites
    3. ; DisplaySprite3 is used by multi-sprite objects
    4.  
    5. ; input:
    6. ;   a0 = address of OST for object
    7. ;   d0.b = index to priority section (DisplaySprite3 only)
    8.  
    9. ;   uses d0.w, a1
    10. ; ---------------------------------------------------------------------------
    11.  
    12. DisplaySprite:
    13.        moveq   #0,d0
    14.        move.b   ost_priority(a0),d0           ; get sprite priority
    15.  
    16. DisplaySprite3:
    17.        movea.w   Disp_OffsetList(pc,d0.w),a1       ; get RAM address for priority level
    18.        move.w   (a1),d0
    19.        cmpi.w   #sizeof_priority-2,d0           ; is this section full? ($7E)
    20.        bcc.s   .full                   ; if yes, branch
    21.        addq.w   #2,d0                   ; increment sprite count
    22.        move.w   d0,(a1)
    23.        move.w   a0,(a1,d0.w)               ; insert RAM address for OST of object
    24.  
    25.    .full:
    26.        rts
    27.  
    28. Disp_OffsetList:
    29.        dc.w v_sprite_queue
    30.        dc.w v_sprite_queue+sizeof_priority
    31.        dc.w v_sprite_queue+(sizeof_priority*2)
    32.        dc.w v_sprite_queue+(sizeof_priority*3)
    33.        dc.w v_sprite_queue+(sizeof_priority*4)
    34.        dc.w v_sprite_queue+(sizeof_priority*5)
    35.        dc.w v_sprite_queue+(sizeof_priority*6)
    36.        dc.w v_sprite_queue+(sizeof_priority*7)
    37.        if ((*-Disp_OffsetList)/2)<>countof_priority
    38.        inform 3,"Mismatch between DisplaySprite and countof_priority."
    39.        endc
    40.  
    41. ; ---------------------------------------------------------------------------
    42. ; Subroutine to add a child object to the sprite queue
    43. ;
    44. ; input:
    45. ;   a1 = address of OST for object
    46.  
    47. ;   uses d0.w, a2
    48. ; ---------------------------------------------------------------------------
    49.  
    50. DisplaySprite2:
    51.        moveq   #0,d0
    52.        move.b   ost_priority(a1),d0           ; get sprite priority
    53.        movea.w   Disp_OffsetList(pc,d0.w),a2       ; get RAM address for priority level
    54.        move.w   (a2),d0
    55.        cmpi.w   #sizeof_priority-2,d0           ; is this section full? ($7E)
    56.        bcc.s   .full                   ; if yes, branch
    57.        addq.w   #2,d0                   ; increment sprite count
    58.        move.w   d0,(a2)
    59.        move.w   a1,(a2,d0.w)               ; insert RAM address for OST of object
    60.  
    61.    .full:
    62.        rts

    That should more or less be it. You now have a priority manager that is 16 cycles faster and 52 bytes smaller (the new routine only uses 72 bytes as opposed to 80, and the moveq optimizations free up another 44), all without any RAM modifications! Credit me if you use it, report any issues you discover, and of course, happy hacking!
     
    Last edited: Jan 19, 2024
    • Like Like x 3
    • Informative Informative x 1
    • List
  2. Brainulator

    Brainulator

    Regular garden-variety member Member
  3. Hivebrain

    Hivebrain

    Administrator
    3,049
    162
    43
    53.4N, 1.5W
    Github
    Last edited: Jan 18, 2024
  4. Brainulator

    Brainulator

    Regular garden-variety member Member
    You're right. There are minor differences in how they are set up, though the overall idea is very similar.

    Funnily enough, this is what DisplaySprite (or actionsub) looked like in the Sonic 1 prototype:
    Code (Text):
    1. ObjectDisplay:
    2.         lea    (DisplayLists).w,a1
    3.         move.b    prio(a0),d0
    4.         andi.w    #7,d0
    5.         lsl.w    #7,d0
    6.         adda.w    d0,a1
    7.         cmpi.w    #$7E,(a1)
    8.         bcc.s    locret_8768
    9.         addq.w    #2,(a1)
    10.         adda.w    (a1),a1
    11.         move.w    a0,(a1)
    12.  
    13. locret_8768:
    14.         rts
    Note that at that point in time, the priority SST value (prio, priority, ost_priority, obPriority, sprpri) was at byte $19, not $18. It switched places with the sprite width in dots value (xdisp, width_pixels, ost_displaywidth, obActWid, sprhs) later on. (Names in parentheses come from, in order: the Sonic 1 prototype disassembly I linked to above, the Sonic 2 AS disassembly, Hivebrain's 2021/2022 Sonic 1 disassembly (and OrionNavattan's Sonic 2 ASM68K), Sonic Retro's Sonic 1 disassembly, and Sonic CD as seen in Sonic Gems Collection.)
     
  5. RealMalachi

    RealMalachi

    you can call me mal Member
    Funnily I have a similar version of that routine, with the only real change being that the priorities aren't pre-multiplied by 2, which adds 4 cycles compared to this one.
    It was made by Hame, and given better register use (and also adapted to Sonic 2) by me. It's a pretty easy speedup if you don't want to change every single objects priority data, though for Sonic 2 you will need to change objects that use DisplaySprite3. Not important for Sonic 1. The better register use is just useful in general

    make_priority is Object_Display_Lists+(prid*object_display_list_size), object_display_list_size is $80 btw
    Code (Text):
    1.  
    2. ; ---------------------------------------------------------------------------
    3. ; Subroutine to display a sprite/object, when a0 is the object RAM
    4. ; ---------------------------------------------------------------------------
    5. ; made by Hame and Malachi
    6. ; sub_164F4:
    7. DisplaySprite:
    8.     clr.w    d0
    9.     move.b    priority(a0),d0
    10.     add.w    d0,d0
    11.     movea.w    DisplaySprite_PriorityLUT(pc,d0.w),a1
    12.  
    13. DisplaySprite3:
    14.     move.w    (a1),d0            ; get the amount of objects in the queue
    15.     cmp.w    #object_display_list_size-2,d0    ; is it full?
    16.     bhs.s    .return            ; if so, branch
    17.     addq.w    #2,d0            ; add to the queue amount
    18.     move.w    d0,(a1)            ; copy the addition to the queue amount
    19.     move.w    a0,(a1,d0.w)        ; copy the objects address to the queue
    20. .return:
    21.     rts
    22. ; End of function DisplaySprite
    23.  
    24. DisplaySprite_PriorityLUT:
    25.     dc.w make_priority(0)
    26.     dc.w make_priority(1)
    27.     dc.w make_priority(2)
    28.     dc.w make_priority(3)
    29.     dc.w make_priority(4)
    30.     dc.w make_priority(5)
    31.     dc.w make_priority(6)
    32.     dc.w make_priority(7)
    33. ; ---------------------------------------------------------------------------
    34. ; Subroutine to display a sprite/object, when a1 is the object RAM
    35. ; ---------------------------------------------------------------------------
    36. ; sub_16512:
    37. DisplaySprite2:
    38.     clr.w    d0
    39.     move.b    priority(a1),d0
    40.     add.w    d0,d0
    41.     movea.w    DisplaySprite_PriorityLUT(pc,d0.w),a2
    42.  
    43.     move.w    (a2),d0            ; get the amount of objects in the queue
    44.     cmp.w    #object_display_list_size-2,d0    ; is it full?
    45.     bhs.s    .return            ; if so, branch
    46.     addq.w    #2,d0            ; add to the queue amount
    47.     move.w    d0,(a2)            ; copy the addition to the queue amount
    48.     move.w    a1,(a2,d0.w)        ; copy the objects address to the queue
    49. .return:
    50.     rts
    51. ; End of function DisplaySprite2
    52.  
    53. ; ---------------------------------------------------------------------------
    54. ; Subroutine to display a sprite/object, when a0 is the object RAM
    55. ; and a1 is already make_priority(prid)
    56. ; ---------------------------------------------------------------------------
    57. ; loc_16530:
    58. ;DisplaySprite3:
    59. ;    move.w    (a1),d0            ; get the amount of objects in the queue
    60. ;    cmp.w    #object_display_list_size-2,d0    ; is it full?
    61. ;    bhs.s    .return            ; if so, branch
    62. ;    addq.w    #2,d0            ; add to the queue amount
    63. ;    move.w    d0,(a1)            ; copy the addition to the queue amount
    64. ;    move.w    a0,(a1,d0.w)        ; copy the objects address to the queue
    65. ;.return:
    66. ;    rts
    67.  
     
  6. Hivebrain

    Hivebrain

    Administrator
    3,049
    162
    43
    53.4N, 1.5W
    Github
    That saved me 2 cycles. Every little helps, especially for code running 10 or 15 times a frame.

    I don't know about Sonic 2, but in Sonic 1 priority level 7 isn't used. Removing it saves $80 bytes RAM and some CPU time in BuildSprites.
     
  7. OrionNavattan

    OrionNavattan

    Tech Member
    167
    165
    43
    Oregon
    Sonic 2 uses all seven priority levels, so unfortunately it still needs the full sprite queue. RealMalachi's optimization still applies regardless. (Taking a nod from RISC architectures and loading values into a register to perform consecutive operations before writing back to memory always saves a few cycles since we don't have to memory reads/writes for each operation.)
     
  8. I think you could potentially get away with removing priority 7 in Sonic 2, as only three objects ever use it (the big pylon in CPZ, the CNZ boss, and the orbs in MTZ's boss).
     
  9. RealMalachi

    RealMalachi

    you can call me mal Member
    Heh, thanks. And yeah you can definitely get away with removing priority 7, MTZ boss orbs might need some finetuning but the others should just work
     
  10. OrionNavattan

    OrionNavattan

    Tech Member
    167
    165
    43
    Oregon
    Updated with @RealMalachi's optimization and fixed installation instructions (I missed an additional object that load priority values from an array and two others that calculate another object's priority from their own).