General Project Thread & Feedback

LordOfSquad · Jan 21, 2012

As another layman, I can say that neither in or out sound all that great to me. They both sound pretty muddy and washed out.

dsrb · Jan 21, 2012

I don't desire to see people come in and try to define the product for other people just because their views on audio skew a certain way.
Click to expand...

Ironic since what I'm trying to combat is the possibility that your views are skewed, whether you know it or not—and that suggesting to people that they do a sighted test based on samples that may be from improperly acting hardware, and that such a test is a valid method of evaluation, is just inviting a multiplication of the problem.

But yeah, whatever. I've said about as much as I can, or can be bothered, to say.

Cinossu said:

bloody hell
Click to expand...

Oh you! So British. :D

saxman · Jan 21, 2012

Cinossu said:

While I agree with dsrb and Falk here, I have a different question entirely.

Why the bloody hell have you added support for GYM playback, of all things?
Click to expand...

To test the emulation. VGM is more accurate, but a terrible format. GYM is easier to work with, so it was added first.

LordOfSquad said:

As another layman, I can say that neither in or out sound all that great to me. They both sound pretty muddy and washed out.
Click to expand...

To provide some context, the point is the engine takes an existing sound and improves it. I provided the most dramatic example.

Falk · Jan 21, 2012

It's not that 'in' is better. It's that 'out' is worse. I can go all the way into spectral analysis and psychoacoustics if you want about inharmonicity and ear fatigue, which is science vs subjectivity but like you said, it's veered way off the purpose of this thread.
edit: Was trying to find the part of your post that said "if you think 'in' was better" for quoting purposes but it's gone. Nevermind.

saxman said:

Have you ever tried listening to a 20 kHz sinewave at 44.1 kHz? You can't because they're basically square waves. Higher sampling rates does in fact improve the audiable frequencies.
Click to expand...

This is so blatantly incorrect in approach on so many levels I don't even know where to start. First and foremost, the first overtone on a waveform with 20kHz fundamental would be 40kHz which is way beyond the range of human hearing, even assuming an unlimited sampling rate, meaning they'd essentially sound identical. At least 40% of adults past 25 won't even hear a 20kHz signal to begin with, especially those brought up in city areas/subways/etc.

Secondly, following conventional DSP techniques especially getting into DACs and sample rate conversions, it's actually the other way around. Assuming the theoretical listener being able to hear up to 100kHz (DOG MAN TEST SUBJECT EXTRAORDINAIRE, please pardon the fact he had a terrible accident during birth) a 20kHz SQUARE wave would theoretically sound like a 20kHz SINE wave with a 44.1kHz sampling rate since the low-pass 22.05k Nyquist cutoff (which prevents aliasing) will obliterate all the overtones. Even the first one. (A sine wave has no overtones, only a fundamental)

Thirdly, extrapolating from the above, a higher sampling rate benefits more complex waveforms due to overtone content. A sine wave is as simple a waveform as you can get. You cannot represent a square wave close to Nyquist frequency, but you absolutely can represent a sine wave close to Nyquist frequency. In other words, theory is in direct contradiction to your post.

saxman said:

Interpolation was always put in primarily and first-and-foremost to allow 44.1kHz samples to be resamples at 48kHz, and vice-versa.
Click to expand...

Yes. 100% this.

Falk · Jan 21, 2012

saxman said:

Have you ever tried listening to a 20 kHz sinewave at 44.1 kHz? You can't because they're basically square waves. Higher sampling rates does in fact improve the audiable frequencies.
Click to expand...

In the name of science here's something else stemming from this post.

http://dl.dropbox.com/u/19357938/SineSweep.wav A sine wave sweeping from ~2kHz up to ~47kHz. at a sampling rate of 96kHz.

I don't know about you, but at 20kHz, it doesn't sound much like a square wave to me. In fact, it's pretty inaudible. Despite abusing my ears quite a bit I'd like to say I have still above average range, and I can reliably hear up to 19kHz, which is more or less right smack after the 3sec mark. Hence I'm not too sure if you were serious when you said "have you ever tried X". In fact I'm actually wondering if -you've- tried it yourself.

What's actually more interesting though, is that I guarantee on many consumer playback systems attempting to play this 96kHz sampling rate sweep you -will- hear audible artifacts after the 3sec mark, where the sweep continues on from ~19k up to ~47k, technically which is supposed to be completely inaudible to human hearing. This is most commonly audible as a very rapid sweep back down and up again at lower volume, where there's supposed to be silence. You aren't actually hearing above 20kHz - you're hearing much lower frequency signals generated as a result of aliasing occuring from the quick-and-dirty realtime downsampling applied to anything not at the sound driver's default playback rate. In other words, 96kHz when you don't absolutely need (or are sure that the target delivery has the means to not butcher it on playback) it is not a good idea.

http://dl.dropbox.com/u/19357938/SineSweep2.wav Here it is again, converted to 44.1kHz with Adobe Audition's cookie cutter tools (They aren't spectacular but they're decent). If you open it up in a waveform editor, you'll notice it abruptly goes down to silence at the 3sec mark. This is the band reject filter at work, which is designed to prevent the aliasing problems as according to Nyquist theorem. What I wanted to point out with the second clip though, is right up to that 3sec point, it's going to sound indistinguishable to practically anyone, save a select few and even then only on very specific playback setups.

http://dl.dropbox.com/u/19357938/sine.png Lastly, here's your empirical evidence that there's no problem representing or playing back sine waves close to Nyquist limit. The blocks represent the data stored of a sine wave close to Nyquist (19.5kHz on 44.1kHz sampling in this case). The line represents the waveform as it would be played back by any decent DAC that band-rejects everything above Nyquist. I have no idea how you're coming to the conclusion that it's 'basically square waves'.

steveswede · Jan 21, 2012

@Saxman

Falk and dsrb are right about audio myths I've seen no end of debates about this in the FL studio forum(sorry I chuckled at your 96kz HD music due to it being way beyond human hearing, Scubasteve and Teeloops should have pointed that out for you considering their knowledge of music). If you need more convincing than what falk's and dsrb's brilliant posts have explained, it could be your worth to sign up to KVR audio to ask other people's questions on it. I'm not saying that to prove a point but it's good to understand something correctly.

dsrb · Jan 21, 2012

As well as Steve's good suggestion there, I'll repeat my earlier recommendation of Hydrogenaudio. It has a number of very experienced members, and crucially they can be very patient in explaining things. More so than me, probably!

GerbilSoft · Jan 21, 2012

saxman said:

dsrb said:

saxman said:

Here's a summary of what the sound engine does:
[. . .]
* Interpolation used to generate ultra-clean, high definition sampling above CD quality
Click to expand...

What? You'd better not be pushing the myth that upsampling magically generates additional 'quality', 'cleanliness', or 'definition' from thin air.
Click to expand...

Well, look at the 2xSai filter on Gens -- that's all interpolation, and it does a fairly decent job at rendering non-existant stuff from existing stuff. The same type of thing can be done on sound. I'm not sure why you'd suggest it's a myth...
Click to expand...

If that's the case, then why bother creating "high-definition" graphics for S2HD in the first place? Just use the hq4x filter on Gens. It's the same thing, right? (That, and it'd take a lot less time to implement.)

Sik · Jan 21, 2012

Upscaling and doing interpolation doesn't make it sound better, it makes it sound worse. You're trying to generate data that doesn't exist and interpolation only muffles the waveform. And in fact, your analogy to 2xSAI is perfect - the outcome is crap, and that's because the filter is unable to generate the extra details one would expect from HD graphics.

There's only one instance where upsampling is valid. Let's say we have a 11KHz sample. If you try to pass it as-is to the sound card, it'll sound completely muffled because the sound card (or the driver, if relevant) will apply a ridiculous amount of interpolation. If you upscale it to 44KHz without interpolation (that is, just repeat the samples), it'll sound extremely clear. In other words, interpolation only makes things sound more muffled, not higher quality.

Falk · Jan 21, 2012

Sik said:

There's only one instance where upsampling is valid. Let's say we have a 11KHz sample. If you try to pass it as-is to the sound card, it'll sound completely muffled because the sound card (or the driver, if relevant) will apply a ridiculous amount of interpolation. If you upscale it to 44KHz without interpolation (that is, just repeat the samples), it'll sound extremely clear. In other words, interpolation only makes things sound more muffled, not higher quality.
Click to expand...

I don't know if you're trolling but... no. O_o

Black Squirrel · Jan 21, 2012

All that ZIP shows is that your converter multiplies the file size by five.

At some point I'll be wanting to download this game. Don't really want to sit for longer for a negligible effect that's only detectable on the high-end sound cards that nobody needs nor wants (bar the professionals). The effect might be lovely but it's ultimately a bit pointless. I actually think the default output of an emulator such as Kega Fusion sounds better for Sonic games, even if it's "squeakier" than intended.

I think the high definition equivalent of audio tends to be things like surround sound, not differences in frequency.

saxman · Jan 21, 2012

Me and dsrb discussed all this today. After re-reading the first couple of posts, I realized I said some things that sound a bit harsh. This wasn't my intent, but I can't deny what I said. So I appologized to him for that. We also talked briefly about the substance of the discussion. It was productive.

I can honestly say I did not expect a debate to errupt from this. I could continue to make points, but it all leads to the same place -- nowhere! That's why I opted out of the discussion, because it really takes away from what I wanted to do in the first place, and that is to talk about the sound engine we're using! I'm excited about it, and on a personal level, I feel very proud of what we've all been able to accomplish.

Since there was a debate, I think it's fair that I modify what I wrote about the features. So here it is:

* Genesis/Mega Drive sound emulation with support for VGM/VGZ and GYM playback

* Compressor to allow for full, thick layer of sound

* Tempo and pitch control to allow variations in sound (e.g. revving spindash)

* OGG Vorbis playback support to showcase Tee's music

* Interpolation used to resample and smoothen the audio for alternate sample rates

If anyone would like to ask me about specific points listed, feel free to do so. For instance, a question was asked about the use of GYM, and I explained it was implemented initially as a way of testing the emulation. That's the kind of discussion I'm interested in.

Falk · Jan 21, 2012

saxman said:

I could continue to make points, but it all leads to the same place -- nowhere!
Click to expand...

As long as this stops you from asking for people to listen to 20kHz waveforms.

Hamneggs · Jan 21, 2012

So are you guys still having the DirectX issues?

Sik · Jan 22, 2012

Falk said:

I don't know if you're trolling but... no. O_o
Click to expand...

I'm not trolling. If anything, the sound hardware is x_x When playing 11KHz samples they interpolate like hell to make it sound "nicer", but the result is extreme muffling. The interpolation is much smaller for higher sample rates. Thereby, if you take a 11KHz waveform, repeat every sample four times and feed it to the sound hardware as 44KHz, it will sound much less muffled.

It doesn't make the original waveform sound better than it is, but rather it prevents it from sounding worse than it should.

dsrb · Jan 22, 2012

But that's completely butchering the signal. The whole point of DAC-side interpolation is to round the waves off, thereby removing the post-Nyquist nonsense that results from square waves. Why would you deliberately circumvent this?!

11.025 kHz sounds muffled because it is. A properly output signal at this rate will only contain frequencies up to 5.5125 kHz, which will naturally sound muffled to us, being used to music with hats, cymbals, and whatnot to 16 kHz and a bit beyond. If you want to get around this, find a better sample, rather than mutilating the signal into an obscene square wave that it was never supposed to be.

Edit: fixing sampling rate figures to proper accuracy

Falk · Jan 22, 2012

Sik said:

Falk said:

I don't know if you're trolling but... no. O_o
Click to expand...

I'm not trolling. If anything, the sound hardware is x_x When playing 11KHz samples they interpolate like hell to make it sound "nicer", but the result is extreme muffling. The interpolation is much smaller for higher sample rates. Thereby, if you take a 11KHz waveform, repeat every sample four times and feed it to the sound hardware as 44KHz, it will sound much less muffled.

It doesn't make the original waveform sound better than it is, but rather it prevents it from sounding worse than it should.
Click to expand...

That's the complete opposite way around. You're essentially thinking that interpolation is a problem and zero-order hold is the fix. In fact, it's the other way around. Zero-order hold and the resultant random inharmonics generated as a result is the problem, and interpolation is the fix. The rest was pretty much covered by dsrb.

Considering that zero-order hold is pretty much the easiest way to implement a DAC (and sounds bad) and interpolation methods have been refined over the years... decades really, to best rectify the problem I'm completely baffled as to how you could think the problem and solution are the other way around. The irony is you yourself said "You're trying to generate data that doesn't exist" because that's -exactly- what zero-order hold does. It generates frequencies above Nyquist, which should not exist.

edit: Might as well add that's why I'm baffled as to why this audio engine's 'interpolation' results in frequencies that weren't previously there, when interpolation is supposed to get rid of frequencies that shouldn't exist, but -this- dead horse has been beaten to a fine pulp by now and is more of semantics.

edit2: to make this post more useful:
http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem
http://en.wikipedia.org/wiki/Zero-order_hold

Not exactly the most layman of explanations but good enough.

winterhell · Jan 23, 2012

Can we say that the results of sample repetition (like Sik said) and interpolation are similar to that of nearest neighbor and bilinear in imaging ?

btw personal opinions:
I can rarely hear a difference between 192kbps and 320kbps mp3 and CD audio. But if its recorded 24bit/48KHz and done correctly it sounds better even if lossy.
For the synthesized sound of Sonic 2 if you can run it directly at the target frequency will be better instead of recording it and so on. Or you are talking only for the digital samples like the "Sega" chant ?

Falk · Jan 23, 2012

winterhell said:

Can we say that the results of sample repetition (like Sik said) and interpolation are similar to that of nearest neighbor and bilinear in imaging ?
Click to expand...

Quite right, although I'd add that it'd be more akin to a non-integer resize if you're going from 11,025 to e.g. 48kHz (which used to be quite common)

Sample repetition, or 'nearest-neighbour', would look something like this when not every pixel of an input represents the same number of pixels on an output:

dsrb · Jan 25, 2012

winterhell said:

But if its recorded 24bit/48KHz and done correctly it sounds better even if lossy.
Click to expand...

Assuming you're referring to 24 bit / 48 kHz vs. 16 bit / 44.1 kHz:

Congratulations on having superhuman hearing and/or terrible hardware. Please post ABX results.

General Project Thread & Feedback

Useful Searches