3.1: Introduction
Ok. Now we have chosen our weapons and our preferred file format and can start. And soon the next problem arrives: How do we get sound out of this grey box at all?
(You can skip this section if you're an experienced "sound output system" coder. I'd recommend reading through it, tho, as you might get some ideas that you didn't know before)
The first question that arises is which sound API to use. Assuming that we want to use Windows for sound output, there are two possibilities: waveOut (the "normal" way the PC does sound) and DirectSound.
My choice was DirectSound, for some simple reasons:
3.2: The DirectSound Player
The DirectSound init procedure is quite simple (look into the DirectSound SDK for further explanation): Get an IDirectSound interface with DirectSoundCreate, set your cooperative level to the "priority" setting ("exclusive" would be even better for demos, the only problem is that it's unsupported and in fact the same as "priority", at least with DirectX8), retrieve the primary and secondary buffer, set both to your preferred output format (I'd suggest 16bit 44.1KHz signed stereo PCM), lock the entire secondary buffer, clear it, unlock it again...
... and then play it.
Well. Play WHAT? We will need to fill the buffer with data somehow. Again, there are two ways:
The solution I finally used was kind of a hybrid between those two ways. First of all, I decided that I wanted to use a sound thread for output. To make things easy, this thread would be a simple loop which does the following things:
I know, the DirectSound SDK and many other sources will make it seem that things like double-buffering or DirectSound's notorious Position Notifications are a necessity, but in fact they aren't. The only thing that's necessary is that you refill the buffer in time, and the way of determining what's "in time" is completely your decision. Actually, my sleep command waited for about one quarter of the buffer size, so that there's always plenty of headroom in the buffer.
Now for the CPU time interference problem. I wanted the synth renderer to be in sync with the video rendering engine without sacrificing any of the advantages of perfect background playing. I achieved this by defining a synchronisation event (look into the Win32 SDK for nearer specifications) which can "trigger" the sound thread loop, as i replaced the Sleep() command with WaitForSingleObject() which exits if either the specified time has run out or if the event was set.
This way, I was able trigger the event in the main loop via SetEvent(). Due to the inner workings of the Windows scheduler and the fact that my sound thread runs at a higher priority level, the main thread is suspended and the sound thread does one run of the loop. As soon as it comes to WaitForSingleObject() again, the main thread continues. So this is kinda like a direct call into the sound rendering routine - and as soon as your main loop would take too much time for the sound to run stable, the sound thread's timeout value comes into play and "renders the sound in the background" again.
If you want to avoid that the sound thread gets called too often, simply put a "minimum time check" into the loop which skips rendering if not enough samples have been played since the last call.
3.3: Latency and synchronisation.
Let's just recall a key property for what we're just doing:
"The purpose of this sound system is playing back music."
This may be trivial, but this sentence is the key to all latency problems, simply because there is no latency problem anymore. When you simply play back a musical piece, there's nothing that will occur unexpectedly. You play a consistent stream of data which could come directly from a .WAV file and will never ever change throughout playing. That way, you can make the latency as high or low as you want, it doesn't matter - it's clear what will be played anyway, and noone cares if the sound comes out of the speakers a bit later.
Noone cares? Well, I wanted to synchronize video to the sound, so I better SHOULD care when the sound will actually be played. And most people would try to make the latency as low as possible now, just to get the video as close to the audio as they can.
And they forget one thing: The actual latency is known. It's exactly one buffer of sound in length (plus maybe the 20ms additional DirectSound mixing latency, but in most cases you can safely ignore that). So what stops us from just "turning back" our clock the length of one sound buffer? Nothing. And we'll happily recognize that we're in perfect sync then.
So, the demo's main timing source looks like this:
And voila, we have a timing source which is in perfect sync with the audio output and will never stop to do so. Just remember that it will start at minus buffersize upon playing, so better make your timer values signed and wait some time before you start the visuals :)
As this would be faaaar too easy, there are of course some things you've got to consider: DirectSound's GetPosition function may be unaccurate sometimes. You MUST specify DSBCAPS_GETCURRENTPOSITION2 for your secondary buffer, you MUST encapsulate all routines (the sound thread's loop except the Sleep()/WaitForSingleObject() call and the whole GetTimer() routine) into critical sections or mutexes (look into the Win32 SDK again), as you will run into synchronisation problems otherwise...
... and even then, the timer value may skip a bit every few seconds, especially with badly written sound card drivers (can you spell creative?). The only workaround I found for this was checking if the timer delta from the last to the current call made sense. If it was bigger than eg. half the sound buffer size, the current position was ignored and my routine returned the last known position instead. This is far from perfect, but as said - it happened only for one frame every 20 or 30 seconds, and nobody will recognize a small timing jitter now and then.
If you want to synchronize your demo events to certain notes/events in the song, don't waste your time with trying to synchronize the song position counter to the clock (it's possible with a small FIFO queue which receives position/rendered-number-of-samples correlations as the player comes across the position and will be read out by the GetSongPosition function up to the "real" timer value, but why bother) - just enhance your music player by routines which calculate the timer value from the song position and vice versa, use these in your authoring tool and store only timer values for the events in the actual demo. Ths makes things a whole lot easier (and the player code shorter again, without losing the possibility of ultra-tight syncing).
3.4: The rendering loop
Now to the rendering. It makes sense to use a certain granularity, as the synth will most probably have a "frame rate" and aligning the rendering blocks to that rate is in most cases a good idea. Just remember one thing:
A bad idea, however, is to make your buffer sizes a power of two.
The times when ASM coders used AND operations to mask out the buffer offsets are over. Those one or two cycles for a compare operation don't hurt. So, there's no reason for using power-of-two-buffer sizes except that you may be used to it. And in fact, it's even better if you don't. I won't go into too much detail here, but if yo know how a cache tag RAM works, you might realize that the CPU can manage the cache better if the buffers start at "weird" addresses, especially if you use multiple buffers at a time (eg. int the same loop). Just make the buffer addresses a multiple of 32, don't make their sizes a power of two (or leave some space between the buffers, even one dword is enough) and you're set.
Then, use at least a 32bit integer buffer or better a 32bit float buffer for your "final" output signal as it leaves the rendering stage. This also applies for every intermediate mixing buffer, as 16bit precision is much too low (processing them will produce a great amount audible noise if done more than a few times) and you wouldn't have ANY headroom if the signal was likely to clip. For integer buffers, treat them as 1:7.24 fixed point values, for float buffers, normalizing the signal at 1.0 is quite a good idea.
So, the "render" part of the sound thread loop looks more like this:
And thus, your render function will just be called with a destination buffer and an amount of samples and you can write your synth or player or whatever completely OS/platform independent. If you want to port the system, just rewrite the sound output code. Same if you want to use waveOut or .wav writers or your favourite MP3 player output plugin or want to make your whole thing a VST2 plugin (use normalized and pre-clipped float buffers then :) or whatever.
At last, we have sound running in the background, not getting in the way of other CPU time critical routines, with perfect sync and in a nice modular fashion. And it's even easy to code. Do we need more?
"Yes, indeed, we want to have a synthesizer now"
Well, sorry, but more to this later. You've got enough work to do if you followed me to this point, and from now on, things get tough. And second, I haven't finished these parts, so you've got to wait.
Anyway, I hope that this helped you in any way, if you've got any questions, comments or suggestions, simply send a mail to kb@kebby.org or catch me on IRC :)
until then...