don't click here

ASM 68000 unrolled loop copy subroutine

Discussion in 'Technical Discussion' started by OrionNavattan, May 12, 2024.

  1. OrionNavattan

    OrionNavattan

    Tech Member
    182
    180
    43
    Oregon
    A small but potentially useful tool born out of my explorations of Sonic CD Mode 1 and taking some inspiration from MarkeyJester's TwizzlerDec: a universal mass copy subroutine that combines a pair of unrolled loops -- one transferring 1 kilobyte via longword moves, the other 512 bytes via byte moves -- with code to allow copying any amount of data from 1 to $3FFFF bytes (one byte shy of 256 KB), with no alignment requirements. The code will use longword moves if possible, and will copy a byte manually to make even alignment if both source and destination are odd. Keeping in mind how Sonic CD uses unrolled loops in a couple places, the unrolled loops have labels to allow them to be called directly with a specific byte amount (e.g., MassCopy_512). While the speed boost from allowing some byte copies to be handled as longwords is significant (20 cycles to copy four bytes as opposed to 48), the usefulness of the routine is a bit limited by the overhead of the code that determines the attributes of the source and destination addresses. Nevertheless, it can correctly copy any amount of data given to it.

    For flexibility of implementation, I've used register equates for the source and destination registers. The AS version unfortunately lacks the labels within the unrolled loop, as I don't think AS allows using string variables to make dynamically-generated labels, but it is otherwise identical. Fixed thanks to MainMemory; AS version is fully identical to ASM68K version. :>

    Github Repo: https://github.com/OrionNavattan/68000-MassCopy

    Code (ASM):
    1. ; ASM68K
    2. srcreg:    equr a0
    3. destreg:   equr a1
    4.  
    5. ; -------------------------------------------------------------------------
    6. ; Unrolled loop to perform a mass copy of up to 1 kilobyte of data via
    7. ; longword moves. Can be called for any amount of data between 4 and 1024
    8. ; bytes by placing the number of bytes after "MassCopy_";
    9. ;  e.g., "jsr (MassCopy_128).l".
    10.  
    11. ; If the data size is larger than 1 KB or is not divisible by a longword,
    12. ; use the "MassCopy" subroutine below.
    13.  
    14. ; input:
    15. ;   srcreg.l = source
    16. ;   destreg.l = destination
    17. ; -------------------------------------------------------------------------
    18.  
    19. genmasscopy:   macro
    20.  
    21.        lblnum:   equs "\#c"               ; number used in label
    22.  
    23. MassCopy_\lblnum:
    24.        move.l   (srcreg)+,(destreg)+
    25.        c: = c-4                   ; decrement label number
    26.        endm
    27.  
    28.        c: = 1024
    29.  
    30.        rept c/4
    31.        genmasscopy
    32.        endr
    33.  
    34. MassCopy_Base:   ; used for dynamic calls into the above
    35. MassCopy_Done:
    36.        rts
    37.  
    38. ; -------------------------------------------------------------------------
    39. ; Subroutine to perform a mass copy of up to $3FFFF of data.
    40. ; Sets ups/manages calls to the unrolled loop, automatically switching to
    41. ; byte moves if one or both of source or destination are odd, and dealing
    42. ; with any remainder if longword moves are used.
    43.  
    44. ; input:
    45. ;   srcreg.l = source
    46. ;   destreg.l = destination
    47. ;   d0.l = size of data to copy in bytes
    48.  
    49. ; example usage, assuming source and destination are a0 and a1:
    50. ;   lea source(pc),a0
    51. ;   lea (destination).w,a1
    52. ;   jsr   (MassCopy).l
    53.  
    54. ; uses d1.l (byte moves only), d3.w, d4.l, d5.l
    55. ; -------------------------------------------------------------------------
    56.  
    57. MassCopy:
    58.        tst.l   d0
    59.        beq.s   MassCopy_Done               ; exit if size is 0
    60.  
    61.        move.l   srcreg,d4
    62.        move.l   destreg,d5
    63.        sub.l   d4,d5  
    64.        bpl.s   .positive  
    65.        neg.l   d5
    66.  
    67.    .positive:
    68.        cmpi.l   #4,d5               ; d5 = difference between source and destination addresses
    69.        bcs.w   MassCopy_Byte       ; fall back to bytes if less than 4 (can happen with long fills in RLE compression algorithms)
    70.  
    71.        move.w   destreg,d5
    72.        moveq   #1,d3                   ; faster and smaller than using two 'andi #1's
    73.        and.w   d3,d4                   ; d4 = 0 if source is even; 1 if odd
    74.        and.w   d3,d5                   ; d5 = same as above, but for destination
    75.        eor.w   d4,d5                   ; are source and destination both even or both odd?
    76.        bne.w   MassCopy_Byte               ; branch if not (fall back to bytes)
    77.  
    78.        tst.b   d4                   ; are source and destination even?
    79.        beq.s   .even                   ; branch if so
    80.  
    81.        move.b   (srcreg)+,(destreg)+           ; copy one byte to align source and destination to even
    82.        subq.l   #1,d0                   ; minus 1 byte copied
    83.  
    84.    .even:
    85.        move.l   d0,d5                   ; back up total size for later
    86.        lsr.l   #2,d0                   ; d0 = total count of longwords to copy (divide total bytes by four)
    87.        beq.w   MassCopy_FinishBytes           ; branch if fewer than 4 bytes total
    88.  
    89.        move.w   d0,d4                   ; back up total longwords for later
    90.        lsr.w   #8,d0                   ; d0 = count of whole kilobytes (divide total longwords by 256)
    91.        beq.s   .less_than_1kb               ; branch if less than 1 kilobyte total
    92.        subq.w   #1,d0                   ; adjust for loop counter
    93.  
    94.    .longwordloop:
    95.        bsr.w   MassCopy_1024               ; copy 1 kilobyte
    96.        dbf   d0,.longwordloop           ; repeat for all whole kilobytes
    97.  
    98.        andi.w   #$FF,d4                   ; d4 = remaining longwords to copy
    99.        beq.s   .nolongremainder           ; branch if 0 (0-4 bytes leftover)
    100.  
    101.    .less_than_1kb:
    102.        neg.w   d4                   ; invert remaining longword count
    103.        add.w   d4,d4                   ; multiply by 2 (size of 'move.l (srcreg)+,(destreg)+') to make index
    104.        jsr   MassCopy_Base(pc,d4.w)           ; jump to appropriate location in unrolled loop to copy remaining longwords
    105.  
    106.    .nolongremainder:
    107.        andi.l   #3,d5                   ; d5 = remainder if data size was not divisible by a longword
    108.        bne.w   MassCopy_FinishBytes           ; do any leftover bytes if necessary
    109.        rts
    110.  
    111. ; -------------------------------------------------------------------------
    112. ; Unrolled loop to perform a mass copy of up to 512 bytes of data via
    113. ; byte moves. While this can be used the same way as the MassCopy loop, it
    114. ; is recommended to call MassCopy instead as that will automatically
    115. ; optimize the operation to longword moves if the source and destination
    116. ; addresses are even. You MUST use MassCopy if you have more than 512 bytes
    117. ; to copy.
    118.  
    119. ; input:
    120. ;   srcreg.l = source
    121. ;   destreg.l = destination
    122. ; -------------------------------------------------------------------------
    123.  
    124. genmasscopyb:   macro
    125.  
    126.        lblnum:   equs "\#c"               ; number used in label
    127.  
    128. MassCopyByte_\lblnum\:
    129.        move.b   (srcreg)+,(destreg)+
    130.        c: = c-1                   ; decrement label number
    131.        endm
    132.  
    133.        c: = 512
    134.  
    135.        rept c
    136.        genmasscopyb
    137.        endr
    138.  
    139. MassCopyByte_Base:                       ; used for dynamic calls into the above
    140. MassCopyByte_Done:
    141.        rts
    142.  
    143. ; -------------------------------------------------------------------------
    144. ; Similar to MassCopy, expect byte-length moves are used
    145.  
    146. ; uses d1.l, d5.l
    147. ; -------------------------------------------------------------------------
    148.  
    149. MassCopy_Byte:
    150.        move.l   d0,d5                   ; back up total byte count for later
    151.        beq.s   MassCopyByte_Done           ; exit if size is 0
    152.        moveq   #9,d1                   ; shift by 9 to divide by 512
    153.        lsr.l   d1,d0                   ; d0 = count of half-kilobytes (512 bytes)
    154.        beq.s   MassCopy_FinishBytes           ; branch if fewer than 512 bytes
    155.  
    156.        subq.w   #1,d0                   ; adjust for loop counter
    157.  
    158.    .loop512:
    159.        bsr.w   MassCopyByte_512           ; copy 512 bytes
    160.        dbf   d0,.loop512               ; repeat for all half-kilobytes
    161.  
    162.        andi.l   #$FF,d5                   ; d5 = count of remaining bytes
    163.        beq.s   MassCopyByte_Done           ; branch if no remainder
    164.  
    165. MassCopy_FinishBytes:
    166.        neg.w   d5                   ; invert count of remaining bytes
    167.        add.w   d5,d5                   ; multiply by 2 (size of 'move.b (srcreg)+,(destreg)+') to make index
    168.        jmp   MassCopyByte_Base(pc,d5.w)       ; jump to appropriate location in unrolled loop to copy remaining bytes


    Code (ASM):
    1. ; AS
    2. srcreg:    reg a0
    3. destreg:   reg a1
    4.    outradix   10   ; AS defaults to hexidecimal when converting integers to strings, so we need to force decimal
    5.  
    6. ; -------------------------------------------------------------------------
    7. ; Unrolled loop to perform a mass copy of up to 1 kilobyte of data via
    8. ; longword moves. Can be called for any amount of data between 4 and 1024
    9. ; bytes by placing the number of bytes after "MassCopy_";
    10. ;  e.g., "jsr (MassCopy_128).w".
    11.  
    12. ; If the data size is larger than 1 KB or is not divisible by a longword,
    13. ; use the "MassCopy" subroutine below.
    14.  
    15. ; input:
    16. ;   srcreg.l = source
    17. ;   destreg.l = destination
    18. ; -------------------------------------------------------------------------
    19.  
    20. genmasscopy:   macro
    21.  
    22. lblnum       := "\{c}"           ; number used in label
    23.  
    24. MassCopy_{lblnum}:   label *
    25.    move.l   (srcreg)+,(destreg)+
    26. c := c-4                   ; decrement label number
    27.    endm
    28.  
    29. c := 1024
    30.  
    31.    rept c/4
    32.    genmasscopy
    33.    endm
    34.  
    35. MassCopy_Base:   ; used for dynamic calls into the above
    36. MassCopy_Done:
    37.    rts
    38.  
    39. ; -------------------------------------------------------------------------
    40. ; Subroutine to perform a mass copy of up to $3FFFF of data.
    41. ; Sets ups/manages calls to the unrolled loop, automatically switching to
    42. ; byte moves if one or both of source or destination are odd, and dealing
    43. ; with any remainder if longword moves are used.
    44.  
    45. ; input:
    46. ;   srcreg.l = source
    47. ;   destreg.l = destination
    48. ;   d0.l = size of data to copy in bytes
    49.  
    50. ; example usage, assuming source and destination are a0 and a1:
    51. ;   lea source(pc),a0
    52. ;   lea (destination).w,a1
    53. ;   jsr   (MassCopy).l
    54.  
    55. ; uses d1.l (byte moves only), d3.w, d4.l, d5.l
    56. ; -------------------------------------------------------------------------
    57.  
    58. MassCopy:
    59.    tst.l   d0
    60.    beq.s   MassCopy_Done               ; exit if size is 0
    61.  
    62.    move.l   srcreg,d4
    63.    move.l   destreg,d5
    64.    sub.l   d4,d5
    65.    bpl.s   .positive
    66.    neg.l   d5
    67.  
    68. .positive:
    69.    cmpi.l   #4,d5               ; d5 = difference between source and destination addresses
    70.    bcs.w   MassCopy_Byte       ; fall back to bytes if less than 4 (can happen with long fills in RLE compression algorithms)
    71.  
    72.    move.w   destreg,d5
    73.    moveq   #1,d3                   ; faster and smaller than using two 'andi #1's
    74.    and.w   d3,d4                   ; d4 = 0 if source is even; 1 if odd
    75.    and.w   d3,d5                   ; d5 = same as above, but for destination
    76.    eor.w   d4,d5                   ; are source and destination both even or both odd?
    77.    bne.w   MassCopy_Byte               ; branch if not (fall back to bytes)
    78.  
    79.    tst.b   d4                   ; are source and destination even?
    80.    beq.s   .even                   ; branch if so
    81.  
    82.    move.b   (srcreg)+,(destreg)+           ; copy one byte to align source and destination to even
    83.    subq.l   #1,d0                   ; minus 1 byte copied
    84.  
    85. .even:
    86.    move.l   d0,d5                   ; back up total size for later
    87.    lsr.l   #2,d0                   ; d0 = total count of longwords to copy (divide total bytes by four)
    88.    beq.w   MassCopy_FinishBytes           ; branch if fewer than 4 bytes total
    89.  
    90.    move.w   d0,d4                   ; back up total longwords for later
    91.    lsr.w   #8,d0                   ; d0 = count of whole kilobytes (divide total longwords by 256)
    92.    beq.s   .less_than_1kb               ; branch if less than 1 kilobyte total
    93.    subq.w   #1,d0                   ; adjust for loop counter
    94.  
    95. .longwordloop:
    96.    bsr.w   MassCopy_1024               ; copy 1 kilobyte
    97.    dbf   d0,.longwordloop           ; repeat for all whole kilobytes
    98.  
    99.    andi.w   #$FF,d4                   ; d4 = remaining longwords to copy
    100.    beq.s   .nolongremainder           ; branch if 0 (0-4 bytes leftover)
    101.  
    102. .less_than_1kb:
    103.    neg.w   d4                   ; invert remaining longword count
    104.    add.w   d4,d4                   ; multiply by 2 (size of 'move.l (srcreg)+,(destreg)+') to make index
    105.    jsr   MassCopy_Base(pc,d4.w)           ; jump to appropriate location in unrolled loop to copy remaining longwords
    106.  
    107. .nolongremainder:
    108.    andi.l   #3,d5                   ; d5 = remainder if data size was not divisible by a longword
    109.    bne.w   MassCopy_FinishBytes           ; do any leftover bytes if necessary
    110.    rts
    111.  
    112. ; -------------------------------------------------------------------------
    113. ; Unrolled loop to perform a mass copy of up to 512 bytes of data via
    114. ; byte moves. While this can be used the same way as the MassCopy loop, it
    115. ; is recommended to call MassCopy instead as that will automatically
    116. ; optimize the operation to longword moves if the source and destination
    117. ; addresses are even. You MUST use MassCopy if you have more than 512 bytes
    118. ; to copy.
    119.  
    120. ; input:
    121. ;   srcreg.l = source
    122. ;   destreg.l = destination
    123. ; -------------------------------------------------------------------------
    124.  
    125. genmasscopyb:   macro
    126.  
    127. lblnum       := "\{c}"           ; number used in label
    128.  
    129. MassCopyByte_{lblnum}:   label *
    130.    move.l   (srcreg)+,(destreg)+
    131. c := c-4                   ; decrement label number
    132.    endm
    133.  
    134. c := 512
    135.  
    136.    rept c
    137.    genmasscopyb
    138.    endm
    139.  
    140. MassCopyByte_Base:                       ; used for dynamic calls into the above
    141. MassCopyByte_Done:
    142.    rts
    143.  
    144. ; -------------------------------------------------------------------------
    145. ; Similar to MassCopy, expect byte-length moves are used
    146.  
    147. ; uses d1.l, d5.l
    148. ; -------------------------------------------------------------------------
    149.  
    150. MassCopy_Byte:
    151.    move.l   d0,d5                   ; back up total byte count for later
    152.    beq.s   MassCopyByte_Done           ; exit if size is 0
    153.    moveq   #9,d1                   ; shift by 9 to divide by 512
    154.    lsr.l   d1,d0                   ; d0 = count of half-kilobytes (512 bytes)
    155.    beq.s   MassCopy_FinishBytes           ; branch if fewer than 512 bytes
    156.  
    157.    subq.w   #1,d0                   ; adjust for loop counter
    158.  
    159. .loop512:
    160.    bsr.w   MassCopyByte_512           ; copy 512 bytes
    161.    dbf   d0,.loop512               ; repeat for all half-kilobytes
    162.  
    163.    andi.l   #$FF,d5                   ; d5 = count of remaining bytes
    164.    beq.s   MassCopyByte_Done           ; branch if no remainder
    165.  
    166. MassCopy_FinishBytes:
    167.    neg.w   d5                   ; invert count of remaining bytes
    168.    add.w   d5,d5                   ; multiply by 2 (size of 'move.b (srcreg)+,(destreg)+') to make index
    169.    jmp   MassCopyByte_Base(pc,d5.w)       ; jump to appropriate location in unrolled loop to copy remaining bytes
    170.  
    171.    outradix 16       ; restore default outradix
     
    Last edited: May 19, 2024
  2. MainMemory

    MainMemory

    Kate the Wolf Tech Member
    4,797
    383
    63
    SonLVL
    AS does allow for dynamic label generation, I forget the exact syntax at this particular moment, but the functionality is there.

    Ah, here it is, in the manual's section on symbols:
    Additionally, labels defined within macros are by default local to the macro, so if you want to define a global label within a macro, you need to use the LABEL directive, like so:
    Code (Text):
    1. my_label LABEL $
     
    Last edited by a moderator: May 12, 2024
    • Informative Informative x 3
    • List
  3. OrionNavattan

    OrionNavattan

    Tech Member
    182
    180
    43
    Oregon
    Thank you! Took a bit of trial and error, but the AS version is now identical to the ASM68K one.
     
  4. OrionNavattan

    OrionNavattan

    Tech Member
    182
    180
    43
    Oregon