Making Sonic 2's sound driver good

Discussion in 'Engineering & Reverse Engineering' started by Clownacy, Mar 17, 2015.

  1. Clownacy

    Clownacy

    Tech Member
    802
    36
    28
    Because of its relation to S1's driver (it's a fixed Z80 port of it), I've been fooling around with S2's driver more than ever, and I've found some interesting things. So what say we give this driver some level of technical competence?

    I have quite a few changes here that eventually link with each other, so if you're gonna follow along, don't skip anything. Also, flamewing's stuff ties into this, so check that out beforehand. It doesn't hurt to check out the other improvements scattered around my driver's topic (and here, and here, and here).

    Optimising zWriteFMI/zWriteFMII
    What's the first thing you notice when comparing S3K's zWriteFMI to S2's? Besides the backing-up, and restoring of af, S2's calls something called zFMBusyWait; S3K doesn't.

    You might have heard, if you've read through its topic, that my driver wasn't the best at waiting for the YM2612. That's what this is for: making sure the processor doesn't write to it too fast. Thing is, while S2 checks and waits, S3K just waits a set amount of time, evident by its use of a 'nop'. So, I guess we can remove S2's check, pop in a 'nop' and free up some space for another rsttarget, along with making zWriteFMI and zWriteFMII smaller.

    This is simple enough. Just remove the two calls to zFMBusyWait in each piece of code, and then remove the now-useless 'push/pop af' pair at the top of them.

    Note that you'll want to keep the remaining pairs, which are towards the bottom. Unlike S3K, S2 relies on register 'a' not being modified. The upside to this is that the 'push af' makes a good substitute for S3K's 'nop': occupying the Z80 for more than enough time for the YM2612 to catch up.

    With that done, you can remove zFMBusyWait: everything from the 'align 8' to the 'End of function'. Since we've just shifted all of the later code, change zVInt's align to 38h, so it isn't affected.

    Current changes:
    • Free rsttarget slot in 38h range
    • Smaller zWriteFMI
    • Smaller zWriteFMII

    Adding common bankswitch subroutine
    With these Z80 drivers, when it comes to just about anything other than the DAC component, you generally want to size-optimise. Here, we'll be doing so by eliminating the repeating, redundant code that is this thing's many bankswitches. Flamewing's code introduced a dynamic bankswitch subroutine, that changes to a bank determined by the contents of register 'a'. Compare this to S2's usual bankswitching code, which is hardcoded. Granted, S2's driver did try, at least, by making the hardcoded zBankSwitchToMusic a subroutine, but that thing had two sets of hardcoded bankswitching in there. We can do better.

    First up, remove the 'bankswitch' macro. Now, out with the old, and in with the new: time to repurpose flamewing's zBankSwitchToMusic.

    As you can see, zBankSwitchToMusic is somewhat hardcoded in its current state: register 'a' is changed to whatever zMusicBankNumber is holding. This is great for music, but not so much for everything else.

    Below the 'ld a,(zMusicBankNumber)' instruction, add the label 'zBankSwitch'.

    Now, search for the remnants of the 'bankswitch' macro. It's still being used in some areas. They take this form:

    Code (ASM):
    1.     bankswitch SoundIndex
    We'll want to change them to look like this:

    Code (ASM):
    1.     ld  a,zmake68kBank(SoundIndex)
    2.     call    zBankSwitch
    Do this for everything that uses the macro. The new usage of the zmake68kBank function means that you'll have to move it to the top of the file. Just pop it underneath the zmake68kPtr function.

    But wait, we're not done here. This may be functional, and we may have saved a fair bit of space in doing so, but we can take this further. Quite a bit further...

    Current changes:
    • Free rsttarget slot in 38h range
    • Smaller zWriteFMI
    • Smaller zWriteFMII
    • Smaller bankswitch

    Improving common bankswitch subroutine
    Count how many times zBankSwitchToMusic is called, and count how many times zBankSwitch is called with 'ld a,zmake68kBank(SoundIndex)' before it (it has to be SoundIndex). They're both called plenty of times, but note also that the former is done more than the latter. Wouldn't it be great if we could eliminate all those duplicate 'ld a,zmake68kBank(SoundIndex)'s? Heck, what if we could shorten zBankSwitchToMusic's calls?

    Now, at the moment, we can't just put a 'ld a,zmake68kBank(SoundIndex)' above zBankSwitch, because zBankSwitchToMusic is there, and moving that elsewhere, and making it use a 'jr' to reach zBankSwitch, just seems like the sub-optimal way. This is where our first tweak comes in: how about moving zBankSwitchToMusic to that now-free rsttarget slot, and creating a new zBankSwitchToSound in its old place?

    Take zBankSwitchToMusic and its lone instruction, and put it underneath zPalModeByte. Above its label, add 'align 8', and after its one instruction, add 'jp zBankSwitch'. Now we can begin calling this code, not with 'call's, but with the smaller 'rst's. Hey, save all the space we can, right?

    Find all 'call zBankSwitchToMusic's, and change them to 'rst zBankSwitchToMusic'. Note that there exists a 'jp zBankSwitchToMusic'. That, you'll have to replace with a 'rst zBankSwitchToMusic' followed by a 'ret'.

    Now that that's out of the way: above zBankSwitch, add this:

    Code (ASM):
    1. zBankSwitchToSound:
    2.     ld  a,zmake68kBank(SoundIndex)
    Following this, find all of the areas that use 'ld a,zmake68kBank(SoundIndex)' + 'call zBankSwitch', and replace them with 'call zBankSwitchToSound'.

    And with that, bankswitching to music and SFXs now requires less space. The reason zBankSwitchToMusic is an rsttarget instead of zBankSwitchToSound, is that zBankSwitchToMusic is used more often, so it saves more space than zBankSwitchToSound.

    Current changes:
    • Smaller zWriteFMI
    • Smaller zWriteFMII
    • Smaller bankswitch
    • Optimised bankswitching for music and SFX

    Optimising DAC playback loop
    Let's start off small. Above zloc_1A8, you can see a 'push/pop af' pair. These don't seem to have any purpose besides slowing sample playback down, so remove them. If you want the two halves of the playback loop to have equal cycle counts, insert a 'nop' after the 'dec de'.

    You'll notice that the DACs are now higher-pitched. To fix this, go to zDACMasterPlaylist, and insert a '+1' after each sample's tempo (the second value).

    Up next is an improvement from Mega PCM.

    In zWriteToDAC, you can see that each half of the playback loop writes 2Ah to zYM2612_A0. This is done constantly in case zYM2612_A0 has been set to something else, usually by zWriteFMI or zWriteFMII. The problem is that this is done in the playback loop, affecting playback speed. It would be better to make it so that, whenever zYM2612_A0 is changed by something, it's set back to 2Ah once it's done.

    Start by removing the two 'ld a,2Ah' + 'ld (zYM2612_A0),a' pairs, and then go to zWriteFMI and zWriteFMII, and, before the 'pop af', add this:

    Code (ASM):
    1.     ld  a,2Ah
    2.     ld  (zYM2612_A0),a
    Now the DAC pitch will be even higher. Change zDACMasterPlaylist's '+1's to '+2'. You might want to experiment with if 2 or 3 suits you, since neither are precise to the original pitch.

    There's one last change we can make, which is just a faster way of offsetting into zDACDecodeTbl.

    Take zDACDecodeTbl, and 'put 'align 100h' above it (replacing the ensure1byteoffset). Try moving this around to somewhere where it doesn't create much padding. Then, in zWriteToDAC, find 'ld (zloc_18B+2),a', and change it to 'ld iyl,a'. You can remove the zloc_18B label now. Do the same with 'ld (zloc_1A8+2),a'.

    Current changes:
    • Smaller bankswitch
    • Optimised bankswitching for music and SFX
    • Optimised DAC playback loop

    Adding multiple DAC banks
    Sonic 2's driver limits itself to a single DAC bank. Moving on:

    We'll need to create a DAC bank list, not unlike DAC_Banks in S3K's driver. You'll want to put it above zSaxmanDec. The idea is pretty simple: for each DAC sample you have, there's a 'db zmake68kBank(SndDAC_SampleX)'. S2 by default has 7, so you get the idea. You don't need to have a dummy value at the beginning, like S3K did. Name this list 'zDACBanks', and, above the label, place an 'ensure1byteoffset'. Set it to the size of the list (7 by default). Also, at zDACMasterPlaylist, change idstart to 80h. This will make some future calculations easier.

    Immediately under zUpdateDAC, you can see the bankswitch which we need to change, but first, scroll down a bit to find 'sub 81h'. Our changing of zDACMasterPlaylist means that we have to change this, but, hey, we can save some space while doing this. Just change this...

    Code (ASM):
    1.     ld  a,(zCurDAC)         ; Get current DAC sound
    2.     sub 81h                 ; Subtract 81h (first DAC index is 81h)
    3.     ld  (zCurDAC),a         ; Store that as current DAC sound
    ...to this:

    Code (ASM):
    1.     ld  hl,zCurDAC          ; Get address of 'current DAC sound' value
    2.     res 7,(hl)              ; Subtract 80h (first DAC index is 80h)
    3.     ld  a,(hl)              ; Get current DAC sound value
    Now that that's done, scroll back up to the bankswitch, and replace the 'ld a,zmake68kBank(SndDAC_Start)' with this:

    Code (ASM):
    1.     ld  a,(zCurDAC)             ; Get currently playing DAC sound
    2.     and 7Fh                 ; Strip 'queued' bit
    3.     add a,zDACBanks&0FFh            ; Offset into list
    4.     ld  (.writeme+1),a              ; Store into the instruction after .writeme (self-modifying code)
    5. .writeme:
    6.     ld  a,(zDACBanks)               ; Get DAC's bank value
    Now you can try splitting your DACs amongst multiple banks, and the driver will handle it just fine.

    Current changes:
    • Smaller bankswitch
    • Optimised bankswitching for music and SFX
    • Optimised DAC playback loop
    • Support for multiple DAC sample banks

    Adding automatic DAC bankswitching
    By doing this, we can do away with aligning our samples to banks, and not have to worry about crossing any bank boundaries. This feature comes from Sonic Crackers, with adjustments from Mega PCM.

    We'll first have to implement a byte of RAM containing the current DAC bank. This will require modifications to zUpdateDAC. Go there, and take your new code, and the call to zBankSwitch, and move it down to above the 'ld a,80h'. Since this code determines which bank to load, we only want to calculate this when a sample is started, since the sample can continue into another bank during playback. Moving the code to an area processed only when a new sample is played should do the trick.

    Next up, remember these?

    Code (ASM):
    1.     ld  a,(zCurDAC)             ; Get currently playing DAC sound
    2.     and 7Fh                 ; Strip 'queued' bit
    Code (ASM):
    1.     ld  hl,zCurDAC          ; Get current DAC sound
    2.     res 7,(hl)              ; Subtract 81h (first DAC index is 81h)
    3.     ld  a,(hl)              ; Store that as current DAC sound
    Swap them around, and delete the 'and 7Fh'. This is just an optimisation to not have the same thing be done twice, which we can only do now that the code's been moved.

    After 'ld a,(zDACBanks)', add 'ld (zCurrentDACBank),a'.

    Then, back up at zUpdateDAC, before the 'exx', add this:

    Code (ASM):
    1.     ld  a,(zCurrentDACBank)         ; Get current DAC bank
    2.     call    zBankSwitch             ; Switch to current DAC sample's bank
    Now, under zPaused, define zCurrentDACBank with 'db 0'.

    Now the way this works should be obvious: the initial bank is determined by the code we added earlier. Its value is stored away for later. When needed, this value is read so the DAC playback can resume.

    With the current-bank-value system added, we can get to adding the dynamic bankswitching. Go to zWriteToDAC, and scroll down to the bottom. Above the 'jp zWaitLoop' instruction, add this:

    Code (ASM):
    1.     bit 7,h         ; has bank boundary been crossed?
    2.     jr  nz,zWaitLoop        ; if not, branch
    3.     ld  h,80h           ; correct address so it points to start of bank
    4.     di
    5.     push    hl         
    6.     ld  hl,zCurrentDACBank
    7.     inc (hl)            ; set zCurrentDACBank to the next bank, since the boundary's been crossed
    8.     ld  a,(hl)
    9.     call    zBankSwitch     ; bankswitch to this new bank
    10.     pop hl
    11.     ei
    The comments should be able to describe what's happening there. You can now go to SndDAC_Start, and remove the cnop alignment, sample size checks, all that. You can even remove the Size_of_DAC_samples constant completely. You're free to have your samples wherever you want.

    Current changes:
    • Smaller bankswitch
    • Optimised bankswitching for music and SFX
    • Optimised DAC playback loop
    • Support for multiple DAC sample banks
    • Automatic DAC bankswitching

    From here, these are just small changes that don't impact each other.

    Saving a bank
    flamewing's change had the SFXs separated from the music bank, but the SFX are very small, and most of its new bank goes wasted. You can fit the SEGA sound in the start of the SFX bank quite snugly, making better use of the space.

    Take Snd_Sega, Snd_Sega_End, and their BINCLUDE, and move them to above SoundIndex. Then delete the 'cnop -Size_of_SEGA_sound, $8000', and both of the checks below it. You can also remove Size_of_SEGA_sound in general.

    Making the jump tables smaller
    Nice and easy. Don't get why Sonic Team (and the ones that wrote OutRunners' driver) did this so strangely.

    Starting with zloc_6D5, remove all of the 'nops'. Then, change the two 'add a,a's to this:

    Code (ASM):
    1.     ld  c,a
    2.     add a,a
    3.     add a,c
    Also do this to coordflagLookup. Don't forget to correct the 'ensure1byteoffset's to match the new sizes.

    We can take this a step further, and replace the 'jp's with 'dw's.

    Replace the 'jp's with 'dw's in both zloc_6D5 and coordflagLookup.

    Now, at coordflagLookup, remove the 'jr $'. Then, replace everything from (not including) 'sub 0E0h' until (including) 'inc hl' with this:

    Code (ASM):
    1.     add a,a
    2.     add a,coordflagLookup&0FFh
    3.     ld  (.writeme+2),a
    4. .writeme:
    5.     ld  bc,(coordflagLookup)
    6.     ld  (.writeme2+1),bc ; store into the instruction after coordflagLookup (self-modifying code)
    7.     ld  a,(hl)
    8.     inc hl
    9. .writeme2:
    10.     jp  $
    Next up, at zloc_6D5, remove the 'jr $'. Then, replace everything from (not including) 'sub MusID_StopSFX' until (including) 'ld (zloc_6D5+1),a' with this:

    Code (ASM):
    1.     add a,a
    2.     add a,zloc_6D5&0FFh
    3.     ld  (.writeme+1),a
    4. .writeme:
    5.     ld  hl,(zloc_6D5)
    6.     jp  (hl)
    Again, don't forget to correct the 'ensure1byteoffset's.
     
  2. flamewing

    flamewing

    Emerald Hunter Tech Member
    1,138
    0
    16
    France
    Sonic Classic Heroes; Sonic 2 Special Stage Editor; Sonic 3&K Heroes (on hold)
    This one makes perfect sense for MegaPCM, but is a bit excessive in the context of z80 drivers, such as S2: you actually need to do this only in the z80 V-Int function when it is about to return. The reason is that all calls to zWriteFMI and zWriteFMII will happen during V-Int, and the DAC playback loop will be halted for its duration; so it is immaterial. Mega PCM, on the other hand, is still running its DAC loop while the 68k updates songs, so every YM2612 command it sends needs to restore the register to the correct value.

    Other than this, excellent stuff.
     
  3. Advanced?

    Advanced?

    Member
    57
    19
    8
    Downunda, Mobius
    Sonic the Hedgehog 2+
    Above Do_Updates you'll see this piece of code:
    Code (Text):
    1.     cmpi.b    #$5C,(Hint_counter_reserve+1).w
    2.     bhs.s    Do_Updates
    3.     move.b    #1,(Do_Updates_in_H_int).w
    4.     rts
    Delete it and at the bottom of PalToCRAM you'll see this:
    Code (Text):
    1.  
    2.     tst.b    (Do_Updates_in_H_int).w
    3.     bne.s    loc_1072
    4.     rte
    5. ; ===========================================================================
    6.  
    7. loc_1072:
    8.     clr.b    (Do_Updates_in_H_int).w
    9.     movem.l    d0-a6,-(sp)
    10.     bsr.w    Do_Updates
    11.     movem.l    (sp)+,d0-a6
    12.     rte
    Replace all of it with a "rte"; this is a holdover from the Sonic 1 driver to prevent SMPS from lagging on the vertical interrupts (and even then, it wasn't a good fix as vladikcomper explains). As Sonic 2's sound driver doesn't murder your interrupts, its unneeded.