don't click here

Blast Processing/Direct Color DMA is not the same as general DMA and is specific to the Sega Genesis

Discussion in 'Technical Discussion' started by Cooljerk, Jun 18, 2024.

  1. Cooljerk


    Professional Electromancer Oldbie
    About 5 years ago, Digital Foundry contacted me asking if I could think of any extra special ways to test the accuracy of the then brand-new Analogue Mega SG. They wanted me to point them to a few demos that only extremely accurate hardware/emulation would be able to reproduce. I, obviously, pointed them to Titan, but I also offered to write them a demo of an even more obscure trick: Blast Processing. Not even Titan was doing Blast Processing, and very few emulators could recreate the effect. Anything that can do a blast processing demo is by nature very, very accurate. Saying the words "blast processing" got Eurogamer extremely hyped, so they gave me a couple of weeks to write the demo and do a write up explaining how it worked and the history behind it.

    Now, this is topic is not a referendum on the term "blast processing," the history of how that marketing term went from this specific Direct Color DMA trick to advertisements on TV was already documented before the demo was written. That's the widely discussed part. The part that was more important was the *HOW* it worked, and specifically *WHY* it worked, and what had changed recently. So I went to work, and talking with Nemesis, ChillyWilly, and Oerg866, I finally had enough of a very detailed understanding to produce the video.

    Yet, half a decade later, I still constantly see people say that what was being described in the DF Retro video was "just DMA" that "every console can do" and that "the SNES could do with HDMA" and "so could the Amiga with HAM." These kinds of comments are so irksome, because they are the old adage that a little knowledge is dangerous. In this case, making these claims are a gross misunderstanding of specifically what is uniquely Sega Genesis about the trick, and what differentiates it from, say, HDMA on the SNES.

    Part of this problem is probably the voice over I recorded for DF Retro. Jon, the person who runs DF Retro, and I, originally did our voice work over the phone, casual chat style. The problem was, the night the video was supposed to go up, Jon hit me up and told me he hadn't captured any of my side of the phone conversation, only his voice. This was like 5 am my time, after I'd been up for a couple of days straight putting the finishing touches on all the graphics/animations for the df video, and the video had to go up in like 2 hours. His solution was for me to listen to his half of the phone call, the mime my responses again into a mic for him, to recreate the conversation. Extreme tiredness and the general jankiness of this process made probably a lot of what I discussed come out like gibberish.

    So, in an effort to fix this misconception, I figured I'd restate what was learned, and go into the differences between Blast Processing and the other cited DMA.

    First, to be clear, the trick that is Blast Processing is to prepare a screens worth of pixel data to be DMA'd in sync with your television's active scan output. The Sega Genesis uses the same BUS to both read and write the color palette, so if the tv interface is reading color ram at the same time you write to it (as it would be during active scan), you can change the color being written on the fly, completely bypassing the normal concept of a palette. By DMA'ing constant palette changes in sync with pixel writes to the screen, you can independently control every pixel color (or rather DMA access slot color, as a DMA access slot is slightly fatter than a single pixel).

    So how is this different from SNES's DMA? Firstly, the SNES doesn't do DMA during active scan the same way. On a Sega Genesis, writes to CRAM during active display work as we need, in the proper sense that it correctly updates the target entry in the palette (this is what causes the CRAM dots in stuff like Sonic during the mid-frame palette swaps). On the SNES, however, due to the way the PPU works, writing to CRAM during active display doesn't work correctly. The data goes to where ever the PPU happens to be looking at the moment, corrupting the palette. In other words, the SNES DMA can't actively select an individual palette entry correctly during active scan, DMA to CRAM during active scan doesn't work on an SNES while it DOES work on the Sega Genesis.

    But what about HDMA? The name itself explains the difference: HDMA works during HBLANK, not active scan. That's the "H" in "HDMA." Blast Processing requires DMA during active scan.

    So no, very simply, the SNES cannot do Blast Processing as described.

    What about Amiga HAM? Well, for one, that's not even DMA, that's a special mode of the Denise graphics chip to display more colors using a custom pixel format, but it works completely differently. Blast Processing can arbitrarily select any color to be displayed during active scan on any given pixel, HAM cannot. HAM's name explains what it is doing, it holds and modifies pixels over a range. This is because there isn't enough room on the BUS to simultaneously change the red, blue, and green subchannel of a pixel on the Amiga at the required bitdepth. So instead, HAM will retain the 2 of the 3 RGB subchannel properties of the previous pixel, giving you full width of the BUS to change a single subchannel. Instead of each byte in graphics data mapping to 8 pixels on a plane, in HAM mode, each byte gives you 6-bits of data to change a pixel's single subchannel with, with 2 bits being a mode select to determine which 2 subchannels of the previous pixel to retain. So with HAM, you're not plotting pixels directly, you're plotting RGB subchannels. You can't arbitrarily go from one color to another, you have to blend from one color to another over multiple pixels as you independently change each subchannel.

    So no, the Amiga's HAM is not doing blast processing either.

    Now, many people bring up Traveler's Tale's John Burton's Gamehut when talking about this subject for good reason, he made a video on how he was exploring the technique years before we made our video, which is true. Burton did not invent the trick, however, it's known that obviously SoA knew about the trick. I can confirm that Iguana and Dave Perry's team also knew about the trick, and there is evidence Sega of Japan knew about it too. However, what all these entities got wrong is they lacked a sync method. The Sega Genesis lacks the ability to automatically time DMA to the beginning of active scan from your television, and there are very slight timing variations between revisions of Sega Genesis/Mega Drive. That means that hand tuning DMA to begin at active scan might work for one model of Genesis, but not another. Burton's solution was a calibration method to determine the timing of your Sega Genesis before playing, but considered that solution too inelegant . That's where the second half of our video comes in -- Oerg866 came up with a seemingly random series of operations that can reliably, across every official model of Sega Genesis, perform DMA at the beginning of active scan. Oerg866 himself didn't even know why this series of opcodes worked reliably, it was Nemesis who went deep into the weeds and found out why it works.

    For those in the dark, the algorithm to sync to active scan is as follows:

    Code (Text):
    1.         /* wait for VBlank to begin*/
    2. 1:
    3.         btst    #3,1(a3)
    4.         beq.b   1b                      
    6.         /* wait for VBlank to end -- Sync Error range is up to roughly 12 pixels*/
    7. 2:
    8.         btst    #3,1(a3)
    9.         bne.b   2b                      
    11.         /* setup to scanline by flooding FIFO -- Sync Error range reduces to roughly 1-2 pixels */
    12.         move.w  d0,(a2)
    13.         move.w  d0,(a2)
    14.         move.w  d0,(a2)
    15.         move.w  d0,(a2)
    16.         move.w  d0,(a2)
    17.         move.w  d0,(a2)
    18.         move.w  d0,(a2)
    19.         move.w  d0,(a2)
    20.         move.w  d0,(a2)
    21.         move.w  d0,(a2)
    22.         move.w  d0,(a2)
    23.         move.w  d0,(a2)
    24.         move.w  d0,(a2)
    26.         /* Nudge to DRAM refresh period -- Refresh period is 2 pixels big, eating Sync Error range*/
    27.         nop
    28.         nop
    29.         nop
    30.         nop
    32.         /* Process the blast! */
    33.         move.l  #0x934094ad,(a3)        /* DMALEN LO/HI = 0xAD40 (198*224) */
    34.         move.l  #0x95729603,(a3)        /* DMA SRC LO/MID */
    35.         move.l  #0x97008114,(a3)        /* DMA SRC HI/MODE, Turn off Display */
    36.         move.l  #0xC0000080,(a3)        /* dest = write cram => start DMA */
    38. /* CPU is halted until Blast Processing is complete */
    The basic gist is you wait for VBlank to end, because VBlank is one of the few timing signal syncs the Genesis can produce. We can determine when VBlank begins and when it ends within a range of time, so we use that as a starting point. Now, again, different revisions of Sega Genesis have different timing, so this isn't a precise time, it can vary by microseconds. Nemesis claims this will yield an error range of about 12 pixels, meaning our DMA would start anywhere within a range of 0-12 pixels. The code following, which moves data 16 bits at a time between registers, are operations that are known to eat up set amounts of time by flooding the FIFO queue. This is to move the timing of the next operations to the start of a scanline. Now, again, within variants of revisions, this time at the start of the scan line varies by a few microseconds. This reduces the pixel range that DMA will start from between 0 to 2 pixels, meaning if we fired off our DMA, it'd begin randomly (according to revision of Sega Genesis) at any of 0, 1, or 2 pixels from where we'd assume.

    This is where the very specific configuration of Sega Genesis hardware comes into play. The Genesis uses a kind of ram called Dynamic RAM, or DRAM, as opposed to Static RAM, or SRAM, to save cost. Functionally, both are similar, they store electrical currents in flip-flop circuits to represent 1 in binary. Presence of electricity in such a flip flop cel is 1, lack of electricty is 0 (in a very basic sense). SRAM can indefinitely hold its charge, but DRAM degrades over time. As DRAM degrades the current in the flip-flop circuit is lost and thus 1's become 0's. So, to avoid this, every few moments, the electricity in the DRAM flip-flop circuit needs to be "reupped" with a small jolt of electricity. When this occurs, the entire system halts for just a moment to allow the DRAM to refresh, and then continues going. These periods of refreshing are called DRAM Refreshes.

    When I say the entire machine is halted, I mean everything, including DMA, so for this period of time, DMA burst writes will not go forward, they will be buffered until the DRAM refresh is complete. And that's the secret to the entire trick. It just so happens that, near the start of active scan out, on the left side of the screen past the edge of the television display, there is a DRAM refresh period. This DRAM refresh period takes a few microseconds to accomplish, making it a window of 2 pixels that will halt DMA. And that's what is used to sync display -- by using 4 NOP commands, you can pause the Genesis just enough to nudge the start of DMA to fall within this refresh period. Because the 2-pixel width of the refresh period is equal to the maximum microsecond variance of the various models of Sega Genesis, that timing differential effectively disappears. No matter where the DMA timing is off, be it 0, 1, or 2 pixels, it'll be held and paused until the DRAM refresh period is over, 2 pixels later, at which point DMA will begin. If our DMA was 2 pixels off from where we assumed it'd start, after the DRAM refresh period, it'll be 0 pixels off. If our DMA was 1 pixel off, after the DRAM refresh period, it'll be 0 pixels off. And likewise, if the DMA was 0 pixels off, then it still is 0 pixels off after DRAM refresh. This means DMA always begins at the exact precise same time on the scanline, the first pixel after the DRAM refresh period, no matter what model of Sega Genesis.

    THAT is the secret sauce. THAT is what Sega of America, and Traveler's Tales, and Iguana, and the other countless entities which tried to get this trick working missed. THAT is what Oerg866 discovered, which Nemesis clarified. That exact trick -- the ability to reliably sync DMA to the beginning of active scan, is incredibly unique to the Sega Genesis and is accomplished by abusing the very lowest level hardware that exists on the machine. Every bit of this trick is insanely uniquely tied to not just the hardware in the Sega Genesis, but the exact configuration of things. No, other systems can't do Blast Processing, no Blast Processing is not just DMA.

    (although, secretly, yes, the Amiga can do this exact same trick using Copper, where the trick is known as ChunkyCopper in Amiga circles. The reasons for why they do it there are completely different than on the Sega Genesis, and unlike the Genesis, it's handled by a secondary co-processor which can perform DMA without blocking the 68000, so unlike the Genesis, the Amiga actually shipped games with "blast processing." Gloom is the very best example of this, but all the amiga explanation is for another time)

    Hopefully this clarifies quite a bit about Blast Processing and I'll stop seeing those "it's just DMA bro" type posts from people. Shout outs to Nemesis, Chilly Willy, and Oerg866 for their help!
    Last edited: Jun 19, 2024
    • Like Like x 8
    • Informative Informative x 5
    • List
  2. Chimes


    The One SSG-EG Maniac Member
    Any time I hear legends about mid-scanline changes I shiver. Any time I hear bitbanging as the screen is being drawn my eyes dilate. Any time I hear syncing using arcane black magic I can feel my soul running away.
    This is a piece of horror flash fiction and I love it. It does make me wonder if this could be used as a Great Value HDMA where small streams of data are jammed on every scanline... wonder what the applications of that could be
  3. Hivebrain


    53.4N, 1.5W
    Very interesting. Could you use this to write an image to the top half of the screen in a game, and have the bottom half display normally? In your example the display is disabled, but is that absolutely necessary, and can it be reenabled mid-frame?
  4. Cooljerk


    Professional Electromancer Oldbie
    Yes, you can! Firstly, there is a limitation with this Blast Processing trick in that there aren't really enough resources on the Sega Genesis to store a full screen image that is DMA'd to screen, so the best you'll get normally with DirectColor DMA is 2/3 of the screen filled, that's why demos like this seem to cut off:


    the cutoff point is the memory limitation of the genesis. So you're not really ever gonna have a full screen Blast Processing demo. There are many caveats, though. As you noted, the display is turned off prior to the DMA ending, so no sprites, tiles, or any actual sega genesis elements can overlap the Blast Processing area. Even if you could draw sprites, the way it's working is by creating cram artifacts which would be pasted over those elements anyways. So think of it like a top-most immutable layer. Below the BP area, however, once the display is turned back on, you can use normal tiles and sprites and everything, this is a demo of a very simple snatcher-like point text adventure made using this method:

    The main real problem with Blast Processing is that the CPU is halted for about 2/3 of the frame, severely limiting what you can do. You really only have access to about 40 scanlines before you reach VBLANK, so your CPU window is extremely small, really only enough to do something like the adventure game above.

    Now, if you have a second 68k that can do processing in parallel, like with a Sega CD, then you can do much more neat things. Chilly Willy made a small demo of a Wolfenstein clone using Blast Processing on the Sega CD:

    This has even more restrictions, though. You're limited to just a 128k window of workram for the buffer that is the Blast Processing window, so you're half-height resolution. The normal vertical scaling methods of the Sega Genesis don't work, cause it's not drawing using the planes. But it shows that, yes, you can have a fully playable blast processing game this way.

    Just to note: all this talk isn't to glorify Blast Processing. It's a rather useless trick for most things. In addition to the height limitations and the way it starves your CPU, it also produces visible artifacts because dram refresh periods occur during active scan as well. Everytime they happen, you'll see a pixel double, cause the DMA is halted but the tv scanline keeps going. So you wind up with 5 thin strips of double pixel artifacts running down the screen. You can adjust for them, like figure out exactly which pixels in your source image will be doubled up and blending them together to account for the double-fat pixel, but it still is more or less visible depending on the image. There are better methods for getting more colors out of the Sega, like Titans 512c mode, which changes the palette every scanline instead of every DMA access slot. The sole purpose of all this discussion is to just demystify one of the most notorious parts of Sega history, and clear up a lot of confusion that people have when talking about this stuff.

    On a stock Sega Genesis, you could reduce the height of the BP source image to give yourself more scanlines for computation, though, Perhaps a method where you only write the first 1/3 of the screen and the bottom 2/3 of the screen were normal Sega Genesis output would suffice to give enough room for computation.
    • Informative Informative x 4
    • List
  5. Hivebrain


    53.4N, 1.5W
    I was thinking about only drawing the first 16 or so scanlines for some kind of fancy HUD. Realistically it would take up too much RAM, plus there's the overhead of actually drawing to the RAM buffer first.
  6. Cooljerk


    Professional Electromancer Oldbie
    Yeah, the utility of Blast Processing is very limited. The main benefit is you have a direct color framebuffer, something where you can arbitrarily plot pixels. That makes it actually really useful for rendering polygons, but you're gonna be so CPU starved that you don't have anything left over for the math for the transformations or to plot the pixels on the buffer. With a Sega CD you have a second 68k which can do that a bit better, though. Maybe something like a game like Frontier: Elite II would benefit from this.

    Even for static adventure games, things like 512c or Traveler's Tales multi-layer trick with shadow/hightlight mode would probably be easier and give better looking, higher resolution results. Really the ability to plot single pixels arbitrarily is the only real advantage of Blast Processing. Oh, and the ability to arbitrarily choose any color from the master palette without any restriction, that's another big advantage that 512c or the TT trick don't have.

    Another thought is perhaps a cartridge with a co-processor, similar in concept to the 32X, only without the need for an AV passthrough cable. Have the cart's co-processor handle basically all the logic, rendering, etc and use the Genesis as a big television adapter. You could even use SRAM on the cart as an internal framebuffer to speed things up, this wouldn't have the 128k work ram bottleneck of the sega cd. Bank switch between a couple of different areas of SRAM to instantly serve up a new frame, like double buffering.
    Last edited: Jun 19, 2024
    • Like Like x 1
    • Useful Useful x 1
    • List
  7. LocalH


    roxoring your soxors Tech Member
    Rock Band 3 Deluxe
    The cool thing about the "secret sauce" is that it's literally CPU/VDP lockstep, which can be *quite* powerful (in the C64 demo scene, it's called "stable raster" and powers 100% of the advanced effects and software display modes that democoders have invented). That could also be used to feed timed writes to other VDP registers, like the backdrop register at VDP$07 (although it'd be really nice, I don't believe we can DMA to VDP registers lol). You could minimize the number of cycles it takes to write to the register by using more than one CPU register to hold color indices, and of course by unrolling any loops necessary to write consecutive values to the register. I know from some tests I did a long-ass time ago that writes to VDP$07 don't cause CRAM dots like writes to CRAM do. I don't know if writing to the backdrop register would be affected by DRAM refresh, but I'm thinking not? Backdrop register is also the only way to display colors $00, $10, $20, and $30.

    Wonder what else can be done on the Genesis with that lockstep (that hasn't already been discovered, I'm kinda out of the loop on MD dev, having spent a few years hacking on the Harmonix GH/RB engine)? Writing on the same cycle each line can make some really cool effects if one finds the right way to abuse the state of the VDP (especially if you can get internal registers/latches desynced from normal behavior, which is what powers C64 effects like opening the border).

    Maybe I need to take a break from RB modding and revisit MD dev ;)
  8. Cooljerk


    Professional Electromancer Oldbie
    Take a look at what the amiga can do with copper and copper lists, and thats whay blast processing can do, especially when paired with a sega cd. Thats probably the closest comparison, its making the genesis act like a dedicated dma coprocessor like the amiga.