don't click here

ASM Mega Drive Opacity

Discussion in 'Technical Discussion' started by Kilo, Apr 2, 2025.

  1. Kilo

    Kilo

    The Scatterbrained Hacker Tech Member
    1,328
    1,265
    93
    Canada
    Sonic 1 Source Code Recration
    I wanted to do a fade in effect for my project's splash screen while having a scrolling background behind it. I initially thought about just pre-rendering it, but I wanted to keep the ROM small for DAC stuff later. So I threw together a little subroutine that will render uncompressed tiles with dithering.
    Code (Text):
    1.  
    2. ; Input
    3. ; a1 - Address of uncompressed art
    4. ; a6 - VDP data/output
    5. ; d1 - Number of tiles
    6. ; d2 - Opacity level (0 - 0%, 1 - 25%, 2 - 50%, 3 - 75%, 4 - 100%)
    7. VDP_LoadOpacityTiles:
    8.         subq.b  #1,d1       ; dbf
    9.         lsl.w   #3,d2
    10.         lea     (Opacity_Table).l,a2
    11.         adda.w  d2,a2       ; Get index into opacity table.
    12.         move.l  (a2)+,d2    ; Store opacity line 1.
    13.         move.l  (a2),d3     ; Store opacity line 2.
    14. @Loop:
    15.     rept 4
    16.         move.l  (a1)+,d4    ; Get tile data.
    17.         and.l   d2,d4       ; Apply opacity line 1
    18.         move.l  d4,(a6)      ; Write to VDP.
    19.         move.l  (a1)+,d4    ; ^
    20.         and.l   d3,d4       ; ^
    21.         move.l  d4,(a6)     ; ^
    22.     endr
    23.         dbf     d1,@Loop
    24.         rts
    25. Opacity_Table:
    26.     ; 0%
    27.         dc.l    $00000000
    28.         dc.l    $00000000
    29.     ; 25%
    30.         dc.l    $00000000
    31.         dc.l    $0F0F0F0F
    32.     ; 50%
    33.         dc.l    $0F0F0F0F
    34.         dc.l    $F0F0F0F0
    35.     ; 75%
    36.         dc.l    $FFFFFFFF
    37.         dc.l    $0F0F0F0F
    38.     ; 100%
    39.         dc.l    $FFFFFFFF
    40.         dc.l    $FFFFFFFF
    41.  
    An example of it's usage would look like:
    Code (Text):
    1.         move.l    #(($0000&$3FFF)<<16)|(($0000&$C000)>>14)|$40000000,(vdp_ctrl).l    ; VDP command
    2.         move.w    #49,d1                ; Tiles count
    3.         moveq    #2,d2                ; 50% Opacity.
    4.         lea        (Test_Tiles).l,a1    ; Tiles address.
    5.         lea        (vdp_data).l,a6        ; VDP data address.
    6.         bsr.s    VDP_LoadOpacityTiles
    Here's a look at the quick test example I wrote:
    ezgif-1b681b0a3ac8e7.gif
    Obviously 0% and 100% are redundant on it's own, but the expectation is that you're fading graphics in and out through counters or math or something so that's why they're supported.

    Edit: Optimized using Markey's suggestion.
     
    Last edited: Apr 5, 2025
  2. MarkeyJester

    MarkeyJester

    You smash your heart against the rocks Resident Jester
    2,306
    557
    93
    Japan
    I like seeing people try out software rendering, it's a nice neat little bit of code.

    One suggestion; load the two long-word mask patterns out to data registers before the loop, and use the data registers during the loop to mask, since it's always the same pattern every two lines. You'll save yourself heaps of cycle time.
     
  3. Kilo

    Kilo

    The Scatterbrained Hacker Tech Member
    1,328
    1,265
    93
    Canada
    Sonic 1 Source Code Recration
    Oh! Thanks for the suggestion I hadn't thought of that, I really should take the time to learn more about cycles to be honest. Hopefully it should prove to be a useful optimization to me since the demo takes at least 2-3 frames minimum to render 49 tiles (with absolutely nothing else happening), and my project will demand a lot more. Went ahead and edited the original post.
     
    Last edited: Apr 5, 2025
  4. Devon

    Devon

    pfp by @litchui on Twitter Tech Member
    1,551
    1,917
    93
    your mom
    A good rule of thumb to remember when it comes to cycle times is that a bus access lasts (at least, could last more if the responding device is slow) 4 cycles. The 68000 has 16 data lines, which means it reads/writes data in memory 16 bits (1 word) at a time (byte accesses are done by selecting either the top or bottom byte of a word, which is why byte and word access times are the same). Every instruction will be at minimum 4 cycles, since the base opcode has to be read from memory. If the instruction has extra info needed to execute it, then it will take more accesses to retrieve all of it, 1 word at a time. On top of that, if the instruction itself is tasked with doing bus accesses, then you have to count that as well.

    For example, let's take this line of code:
    Code (ASM):
    1.     move.l  $FF0000,$FF1000

    Which assembles into this:
    Code (Text):
    1.     23F9 00FF 0000 00FF 1000

    So, this instruction takes up 5 words (1 for the opcode, 2 for the source address, 2 for the destination address), so we start off with 5*4, or 20, cycles right off the bat. And then, it has to read a longword from 0xFF0000 and store it into 0xFF1000. So that's 2*4, or 8, cycles for the read and another 2*4, or 8, cycles for the write. Add them up and you get a whopping 36 cycles.

    Whenever you look up documented cycle times for 68000 instructions (like in here), you'll see them styled as "n(r/w)". "n" is the total number of cycles, "r" is the number of bus reads, and "w" is the number of bus writes. The instruction in my example is documented as "36(7/2)", where it counts up the total number of reads (5 for reading the instruction, 2 for reading from 0xFF0000) and writes (2 for writing to 0xFF1000). Now, some other instructions will take up extra time outside of the bus accesses (especially with multiplication and division), so the total number of cycles will not always be exactly the number of bus accesses * 4, but this rule of thumb is still worth knowing, since it is a big contributor regardless.

    This is why caching things into registers helps out a lot, since the 68000 can immediately access them and not have to deal with the bus, which is definitely important when running code in a loop like with your function.

    As a side note, this is why it's a good idea to shorten addresses to words instead of longwords, if possible. When it comes to that, the 68000 will read the word and sign extend it into a longword, and the very top byte is masked off due to the 68000 only having 23 address lines and a byte select for the bottom 3 bytes, and you save cycles by not having to read another word to complete the address. This is why you often see the latter half of work RAM used this way (0x8000 through 0xFFFF are extended into 0xFFFF8000 through 0xFFFFFFFF, which are then treated as 0xFF8000 and 0xFFFFFF), but you can also do this for the first 0x8000 bytes of ROM (0x0000 through 0x7FFF are extended and treated as 0x000000 through 0x007FFF).
     
    Last edited: Apr 5, 2025
    • Like Like x 3
    • Informative Informative x 3
    • Useful Useful x 1
    • List
  5. Brainulator

    Brainulator

    Regular garden-variety member Member
    Another source of cycle times is here, courtesy of @MarkeyJester (although the DIVS, DIVU, MULS, and MULU instructions are missing).