Black Art of 3D Game Programming, Chapter 6: Dancing with Cyberdemons

From Dpfileswiki

Jump to: navigation, search

The horrible creatures, living machines, and other manifestations transmitted through the gate of Cyberspace are not silent! In fact, with the right spells and incantations, we can listen to the eerie sounds and music transmitted to us. To do this, we must learn how to control the sound synthesis card within the PC and command it to couple its subspace field with that of the VGA card. The result: the unison of sight and sound creating the ultimate experience of reality.

Contents

Fundamentals of Sound

Figure 6-1: Simple and complex waveforms.
Enlarge
Figure 6-1: Simple and complex waveforms.

Sound is nothing more than mechanical waves traveling through air. These mechanical sound waves have a few simple properties that we shall discuss so that we have a common vocabulary to speak of them. For example, when you speak, your vocal cords resonate at different frequencies, and the sum of these resonations produces the final sound. Hence, the first concept we need to grasp is that of frequency. Simply put, frequency is the rate at which a particular waveform repeats itself as a function of time. For example, Figure 6-1 depicts two different waveforms with different frequencies. The first waveform is called a sine wave and is the purest of all waveforms. It has a frequency of 1000 Hz (Hz means cycles per second). The second waveform is more complex and has a frequency of 10 Hz.

The interesting thing about complex waveforms is that they can all be generated using a sum of sine waves. This is because sine waves are mathematically the most fundamental waveforms. Hence, if we can synthesize a set of sine waves, then theoretically we can synthesize any sound. More on this later.

Figure 6-2: Frequency response of the garden variety human ear.
Enlarge
Figure 6-2: Frequency response of the garden variety human ear.

The next important quantity of sound is amplitude or loudness. This has to do with the size of the waveform or, in the physical world, the strength of the mechanical air wave. Amplitude can be measured in many units, such as volts, decibels (dB), and so forth. But musicians usually measure amplitude relative to some baseline, since they don’t care how many volts or dBs a sound is. They simply need to know the loudest and softest sound that can be made with any particular piece of equipment and use those endpoints to scale all the sounds in between. For example, on your particular sound card, you may be able to set the loudness from 0 to 15. These values represent complete quiet (0) and full volume (15). The only way to get a feel for the loudness of the two extremes is to listen to them.

The next concept of importance is frequency response, which is the way we perceive different sounds. For example, most humans can only hear in the range of 20 Hz to 20,000 Hz. Anything beyond these two extremes will be highly attenuated. Figure 6-2 shows a diagram of the situation. As you can see, humans have a reasonable frequency response, well up to 20,000 Hz, and then the “response” falls off abruptly. Of course, some people have more range and some people have less, but 20 Hz to 20,000 Hz is the average. What does this mean? Well, it means the sounds that we produce must be within this range, or only insects and animals will hear them!

We have mentioned waveform a couple times, but let’s pin the concept down a little better. Waveform means the shape of the sound wave. Figure 6-3 depicts some common waveforms, such as the sine wave, square wave, sawtooth wave, white noise, and human voice. The simplest of all, the sine wave, can’t be broken down any further. Sine waves sound very simple and “electronic” when listened to. The square wave, on the other hand, is constructed out of a collection of sine waves, each a subharmonic of the fundamental frequency of the square wave. Harmonics are multiples of frequencies and subharmonics are submultiples. For example, the frequency 1000 Hz has its first harmonic at 2000 Hz, its second at 3000 Hz, and so forth. Similarly, its first subharmonic would be 500 Hz. Anyway, when you listen to a square wave it sounds “richer” than a sine wave; in other words, it has a bit of “texture.” The reason of course is that the square wave is composed of hundreds, if not thousands, of sine waves, each with a smaller amplitude than the next.

Figure 6-3: The waveforms of sound.
Enlarge
Figure 6-3: The waveforms of sound.

The next waveform in Figure 6-3 is called a sawtooth wave and has a “brassy” sound to it. The sawtooth is also composed of sine waves. The human voice waveform is called complex and has no specific frequency since it doesn’t repeat. However, the voice waveform is generated with thousands of sine waves that make up the final sound. Moreover, the amplitudes of these sine waves are modulated as a function of time while the word or phrase is being spoken. It’s actually very complex, but we can deduce a couple of things from any voice waveform or sample: the overall loudness and the highest frequency. For example, my voice may have most of its spectral energy in the 3 to 4 kHz range, while a woman’s voice may have most of the energy at a higher frequency, such as 6 kHz. This is important when sampling or digitizing voice. We must sample the voice at twice the highest frequency component if we want to reproduce the original sound perfectly.

The final waveform in Figure 6-3, white noise, has an equal distribution of all frequencies at all amplitudes up to some maximum. This is the sound of the ocean, wind, and other relaxing sounds. It’s also the form of noise in electronic circuitry and is very difficult to filter. It’s actually chaotic in nature–but that’s a story for another book!

We now have the basic tools to discuss the two main types of sounds that we’re going to create in this chapter, which are digital samples and synthetic music.

Digitized Sound

Figure 6-4: Conversion of voice to digital form.
Enlarge
Figure 6-4: Conversion of voice to digital form.

Digitized sound is probably the easiest to play because we don’t need much hardware or software to do it. In essence, we must sample or record an analog signal (the sound) and convert it into digital form with an analog-to-digital converter. Figure 6-4 shows the conversion process in action. Note that the analog input waveform is sampled at some rate and then converted into binary words. There are two factors we must consider during the digitization process. First, the rate at which the sound is sampled must be high enough to capture the full spectral range of the input sound. This is called frequency resolution. Second, the analog-to-digital converter must have enough amplitude resolution to sample all the different amplitudes. If there’s not enough of one or the other, errors will occur, such as frequency or amplitude aliasing.

The need for amplitude resolution is easy enough to understand. Simply put, if a sound has thousands of amplitude variations and our analog-to-digital converter only has 4 bits of resolution (16 values), then the sampled sound is going to lose a lot of its amplitude range (see Figure 6-5). The reasoning behind the double sample rate is less obvious, but let’s see if we can derive it using common sense and fundamentals.

Figure 6-5: Loss of amplitude spectrum during the digitization spectrum.
Enlarge
Figure 6-5: Loss of amplitude spectrum during the digitization spectrum.

Recall that all sounds, simple and complex, are really a conglomeration of sine waves. Thus, even a sound such as a human voice is composed of sine waves. Therefore, it follows that any voice must have one component that contains the absolute highest frequency sine wave. For example, Figure 6-6 shows the average frequency spectrum of a man’s voice saying “hello.” You will notice that the largest spikes are centered around 700 Hz; however, there are also spikes all the way up to 12 kHz (as small as they may be). If we were to sample the voice at 2 Hz, we would get most of the information, but if we sampled it at 24 kHz, we would get it all. We’ll get to the doubling in a second, but first listen to this. Have you ever heard yourself on a tape or other recording device? Have you ever noticed that it just doesn’t sound like you? This is because some of those high or low frequencies are getting filtered. Even though the main portion of your voice is being sampled, some of the information gets lost during the recording. This occurs to different degrees depending on different voices. Some people sound perfect on the phone or tape, but others, because of the complex spectral content of their voices, sound weird. This is due to loss of information.

Figure 6-6: Average frequency specturm of a man saying "hello."
Enlarge
Figure 6-6: Average frequency specturm of a man saying "hello."

So why must we sample at twice the rate we are trying to record? The reason is this: if we want to sample a sound or voice that has 4 kHz as the highest frequency, we must sample at 8 kHz so that we can reconstruct the original sine wave. Take a look at Figure 6-7. It shows the same waveform being sampled at two different rates, one rate half that of the other. You will notice that sometimes there aren’t enough data points to reconstruct the sine wave, so an error occurs. This stems from the fact that we need two points on any sinusoidal curve to solve for the frequency of the wave, just like you need two points to define a circle. The whole thing is based on something called Shannon’s Theorem, and the lowest possible sample rate before distortion is called the Nyquist Frequency, at which the sound when played back will be distorted. That’s just for you sound buffs! Without going into all that, we have everything we need to sample and play back sound.

Figure 6-7: Effects of sample rate on digital reconstruction.
Enlarge
Figure 6-7: Effects of sample rate on digital reconstruction.

The moral of the story is that we should sample sounds at 6 kHz to 12 kHz (because of memory constraints) and use 8-bit samples unless high quality is necessary; then 16-bit samples will be more than ample. Personally, I get away with an 8 kHz sample rate with 8 bits of amplitude resolution all the time. The sound of cracking bones and photon torpedoes doesn’t need the resolution of a Mozart symphony. How do we record and play digitized samples on a sound card? First off, the sound card must have a digital sound channel or at least a digital-to-analog converter to play the sounds. Sound Blaster and similar cards all have the ability to play digital sounds. Some cards can play back at rates up to 44.1 kHz in 16 bits. In most cases, sounds can also be digitized by the sound card, so it must have an input analog- to-digital converter (A/D) to satisfy this need. If a particular card does not have the ability to record, most cards can still play back digitized sounds as long as the sound is in some format that can be converted to the native format of the sound card. In the case of DOS games, the most widely used digital sound format is called VOC, which contains a header along with the data for the digital sample. VOC format supports multiple recording rates, stereo sound, and samples up to 16 bits, so it’s more than enough for us. We’ll learn how to read VOC files a bit later, but that’s what we’ll be using. Of course, there are other formats, such as WAV (Windows) and MOD (Amiga), but we’ll stick to VOC for simplicity. However, there’s no reason you can’t write a reader for the others.

The main problem facing the world of sound and video games is lack of standardization. Even if I showed you how to record and play music on a Sound Blaster, you would be out of luck on a Gravis Ultrasound card. Hence, instead of depending on a particular sound card, we’re going to use sound drivers from The Audio Solution and John Ratcliff for both digital samples and music. This will save us a great deal of time, and our code will work on any sound card without change.

Finally, most sound cards have onboard DMA logic and buffers that can play a sound from memory in parallel with the CPU, so the CPU isn’t loaded with any work. In essence, playing a digital sound effect will impose very little computational load on the system CPU and will be practically transparent. The only computations and overhead for playing a digital sound will occur when the sound is started by the driver; after that, the sound card will usually take over. With that in mind, let’s jam!

MIDI Music

Playing music on the computer is a daunting task. The nature of music itself is complex and mathematical. Music is basically a sequence of notes playing on different instruments as a function of time. However, there are about a million little details involved, such as velocity, aftertouch, and tempo, to name a few. Trying to concatenate all this information into a single standard is difficult to say the least, but a standard was finally arrived at in late 1982, and it is known as MIDI. MIDI stands for Musical Instrument Digital Interface, and you can find MIDI ports on most synthesizers and sound cards today. MIDI is a file format that describes a musical composition as a set of channels that are each playing a particular note of a particular instrument. The instruments are a generic collection of popular gadgets that the sound card or MIDI device tries to reproduce as best it can.

The standard MIDI specification states that there are 128 instruments, including pianos, guitars, percussion devices, and so forth. Hence, if a MIDI file indicates that an air guitar is to be played on channel 3, the sound card will do its best to output the sounds of an air guitar. Also, MIDI supports 16 different channels, or in essence, a MIDI stream can control 16 different instruments. So when MIDI music is played on a sound card, the sound card tries its best to support all 16 channels. For example, on the Sound Blaster channels 1 through 9 are usually used for normal instruments and channel 10 is allocated for the percussion device. The only problem with this is that each sound card and MIDI device has its own set of instrument “patches” that sound slightly different. For example, if you compose a song using a Sound Blaster and then play the MIDI file on a Turtle Beach card, the results might not sound totally correct. This is the reason why “instrument patches” came to be and Creative Labs created the CMF music format. CMF is similar to MIDI except that it has a header with instrument patches so that you can reprogram a Sound Blaster’s default instruments with ones that are more appropriate for the composition.

The CMF format is the ultimate in MIDI music, since it allows new instruments to be loaded, but it isn’t a standard, so we can’t really use it. On the bright side, most sound cards do a good job of emulating all the default instruments in the MIDI specification, and most songs will sound decent to the untrained ear (like mine!). The question is, how do we create these MIDI files? Well, if you’re a musician, it’s easy. Using a sequencer and a computer, you can compose MIDI music on your computer and save the file as a *.MID file for use with any MIDI-compatible sound system. If you’re not a musician, you’re in serious trouble, and the only answer is to use premade MIDI files or hire a composer to create the music for you. For example, a friend of mine here in Silicon Valley will compose MIDI music for you at about $250 per minute of music. You can reach him at:

Andromeda/Eclipse Productions
Attn: Dean Hudson
P.O. Box 641744
San Jose, CA 95164-1744

Once we have the MIDI music, how do we play it on the sound card? We will play the music using someone else’s drivers; that is, The Audio Solution’s and John Ratcliff’s, which are on the CD-ROM. Writing MIDI drivers for all the sound cards would take months to years, so we’ll have to use their drivers. This isn’t too bad since they are very complete, efficient, and sound great! Moreover, the interfaces are simple, and along with another layer that we will create on top of the standard Audio Solution/Ratcliff interface, playing music will be as easy as a few lines of code!

Now that we know what format the musical information is in, how is it synthesized? Good question! In most cases, music is synthesized using a technique referred to as FM synthesis, but a new technique that is becoming more affordable, called wave table synthesis, is much more realistic. Let’s discuss FM synthesis first, since it’s more common.

FM Synthesis

Figure 6-8: FM synthesis in action.
Enlarge
Figure 6-8: FM synthesis in action.

As we learned, all sounds are created with sine waves, and the more sine waves the more complex and rich the sound. Be warned, sound theory has a lot of words like rich, textured, brassy, hard, soft, and fluffy. They mean what they mean to you, get it? Anyway, generating sine waves, square waves, and sawtooth waves is easy, but the resulting sounds are artificial. A method had to be devised that would make it possible to add harmonics to some fundamental sine wave having varying amplitudes and then to modulate these harmonics as a function of time. The mathematical technique that accomplishes this is called frequency modulation or FM. It uses much the same technique that FM radio transmitters use to transmit a signal that encodes music on a specific carrier wave.

Figure 6-9: Effects of ADSR amplitude modulation on sound.
Enlarge
Figure 6-9: Effects of ADSR amplitude modulation on sound.

An FM synthesizer has two main components: the modulator and the carrier, as shown in Figure 6-8. The modulator modulates the base frequency of the carrier. Moreover, the output of this chain can be fed back into the input to create harmonics. Most low-cost sound cards use this technique. The cards have a set of FM synthesizers, usually 10 or 20 of them (combined to make channels), that are programmed with parameters to emulate a specific musical instrument. In addition, each channel has an ADSR amplitude control. ADSR stands for Attack Decay Sustain Release and controls the amplitude of the sound as a function of time. Figure 6-9 shows a typical ADSR envelope and the result of it on an FM synthesized instrument.

Using FM synthesis along with the concept of ADSR, we could write software to control a sound card to play a MIDI file. This isn’t easy and takes thousands of lines of code, but nevertheless can be done. This is exactly what The Audio Solution has done for us; however, they have written drivers not just for the Sound Blaster, but for many sound cards! This relieves us from a ton of work. Once we know how to use both the digital and MIDI drivers, we’re in business!

The only problem with FM synthesis is it’s too perfect! That’s right, real instruments and real people make slight mistakes in the music, little deviances. These deviances and “english” are lost in the translation since an FM synthesizer always reproduces the sound perfectly. Alas, sound engineers turned from FM synthesis and tried taking digital samples of instruments and using them to create music–with amazing results.

Wave Table Synthesis

Figure 6-10: Layout of a wave table synthesizer.
Enlarge
Figure 6-10: Layout of a wave table synthesizer.

Wave table synthesis is becoming very popular and is based on the concept of digitizing the actual sound output of each instrument and then playing back the sounds when called upon by a MIDI file. The basic hardware is shown in Figure 6- 10. As you can see, a wave table holds short digital samples of each instrument. When a MIDI instruction indicates that a particular instrument should be played, the digital data is taken from the wave table, amplified, and played along with the other sounds to make the final musical note(s).

The problem with wave table synthesis is its complexity. It requires a DSP (digital signal processor) to perform the mathematics for pitch, amplitude, and envelope changes in real-time. This has been the main limiting factor for wave table-based sound cards in the past; but today, with the price of processors decreasing and their performance increasing, wave table synthesis is becoming more common. The Sound Blaster AWE32 is an example of a wave table synthesizer, and when you listen to it, you will notice a difference. Instead of sounding synthesized, the music sounds like it’s being played by musicians in the next room.

The beauty of wave table synthesis is that MIDI composers don’t need to do anything different. They can still compose as if they were writing for an FM synthesizer because the MIDI specification doesn’t stipulate how the sounds are made, it only stipulates what the results should be. When wave table synthesis is coupled with instrument patches, its real power comes into play. I suspect a hundred dollar sound card will rival the performance of thousands of dollars of musical equipment in the near future.

Before we move on, there is one more new technology that I just want to hint at. It’s actually possible to produce microsized wave guides, and using these wave guides, it’s possible to synthesize physically the actual parameters of an instrument’s acoustics. This is theoretical stuff, but I bet you by the time this book is out, you might be able to find “virtual wave guide synthesis” sound cards.

The Audio Solution

Figure 6-11: Sound system architecture under The Audio Solution.
Enlarge
Figure 6-11: Sound system architecture under The Audio Solution.

The main focus of this chapter is to learn how to play digitized sounds and music using the products DIGPAK and MIDPAK from The Audio Solution and John Ratcliff. The drivers were created by John Ratcliff and John Miles of Miles Design as solutions to playing digitized effects and music on myriad sound cards available. As Cybersorcerers, we only need to know two things about DIGPAK and MIDPAK: how they work and how to use them. Let’s begin by seeing how everything fits together.

Both MIDPAK and DIGPAK are TSR (terminate and stay resident) drivers that latch on to interrupt 66h (INT 66h). These drivers are installed via COM files or loaded by the application software. MIDPAK plays music and DIGPAK plays digitized sounds. To communicate with MIDPAK and DIGPAK, interrupt calls are made with the proper parameters in various registers. Figure 6-11 is a graphical representation of the architecture of the system. As you can see, MIDPAK and DIGPAK are chained together, and when an INT 66h is invoked, one of the drivers either absorbs or transmits the command–that is, each driver will only listen to commands that are meant for it. This is how both drivers can be located at the same interrupt.

The interesting thing about the drivers is that the DIGPAK digital sound driver doesn’t do much except direct the sound card to play a sound from memory or disk, and the sound card takes care of most of the work once the sound starts. The MIDPAK driver is different though. It must continually play MIDI music through the sound card and thus is called by an interrupt at a periodic rate. In essence, when we make a call to the MIDPAK driver using INT 66h, the driver does what we tell it, but another part of the driver (the active part) is continually called based on another interrupt that is hardware based.

The reason for this is that most sound cards can’t play music; they must be programmed to do so in real-time. And the best way to do this is through an interrupt. Hence, MIDPAK uses the standard timer interrupt that usually occurs 18.2 times a second, but reprograms it to a rate of 288 Hz. This is so music can be played with a high enough resolution to catch all the pitch, velocity, and amplitude changes in real-time. This reprogramming is done by MIDPAK, but it’s nice to know what’s going on. Let’s review how everything works and add a little. First, both music and digital sound effects will be played using MIDPAK (music) and DIGPAK (digital effects). These drivers were written by The Audio Solution and are installed at interrupt 66h by running a COM file for each. In the case of MIDPAK, the COM file that loads the MIDI driver is called MIDPAK.COM. Also, it loads two other files called MIDPAK.ADV and MIDPAK.AD, which are the actual sound driver and instrument patches for your particular sound card. The digital sound driver is called DIGPAK and is loaded by running SOUNDRV.COM. Unlike the MIDPAK driver, the digital sound driver doesn’t load any extra files. Here’s a list of the files for each driver:

  • MIDPAK - The MIDI Driver
  • MIDPACK.COM - The interface to the low-level driver
  • MIDPAK.ADV - The MIDI driver for your sound card
  • MIDPAK.AD - The instrument patches for your sound card
  • DIGPAK - The Digital Sound Driver
  • SOUNDRV.COM

Of course, there is no law that states we must load both drivers. For example, we might want to use our own digital sound driver. In this case, we would only load the music driver. We can load one, the other, or both; but if both drivers are loaded, it’s a good idea to first load the digital driver SOUNDRV.COM and then the MIDI driver MIDPAK.COM on top of it.

At this point, a couple of questions should come to mind: where do we get these drivers, and how do we communicate with them? Let’s begin by answering the first question.

Creating the Drivers

Two programs are used to create the drivers. One program creates the digital sound driver SOUNDRV.COM, and the second program creates the MIDI driver MIDPAK. COM and its associated files MIDPAK.ADV and MIDPAK.AD. Both programs are in the subdirectory DRIVERS under this chapter, and they are called:

  • SETM.EXE - A menu-driven program by John Ratcliff to create the MIDI drivers
  • SETD.EXE - A menu-driven program by John Ratcliff to create the digital sound driver

To create the driver files for your particular sound card, you must run both of the programs SETM.EXE and SETD.EXE in the DRIVERS directory, which will output the files SOUNDRV.COM, MIDPAK.COM, MIDPAK.ADV, and MIDPAK.AD. Then take these output files and place them into your application’s directory, so your game programs can access them.

If you view the contents of the DRIVERS directory, you will notice a lot of files that have various extensions along with familiar names of sound cards as the file names. These files generate the files for your particular card’s sound drivers for both DIGPAK and MIDPAK. Another interesting thing about SETM.EXE and SETD.EXE is that each program allows you to configure the I/O port and interrupt of your sound card. This information is written right into the drivers themselves! So, the application doesn’t ever have to change I/O ports or interrupts. The only downside to this is that the user of your game must run SETM.EXE and SETD.EXE, unless you are going to write a single installation program yourself.

Installing the Drivers

Once you have generated the driver files with SETM.EXE and SETD.EXE and copied them into your application’s working directory, installing the drivers is simple. At the DOS prompt enter SOUNDRV.COM and MIDPAK.COM, and the drivers will load into memory as TSRs. Of course, MIDPAK.COM will also load the files MIDPAK.ADV and MIDPAK.AD, but this is transparent to us. Just make sure the files are in the working directory of the application, so the loader can find them.

Once the drivers are installed, they will wait for commands to be sent to them via interrupt 66h. Hence, we must learn how these commands work and how to interface to INT 66h.

Playing Digitized Sounds with DIGPAK

DIGPAK is the driver used to play raw digital samples from memory. The DIGPAK interface is rich and complex, but can be invoked through interrupt 66h. However, to make the digital driver even easier to use, we’re going to write a single layer of software that sits on top of DIGPAK, creating a C interface. In most cases, the interface will simply call the appropriate DIGPAK function, but functions such as loading sound files and converting them to DIGPAK format aren’t supported by DIGPAK; hence, we will add this functionality.

The DIGPAK Interface

The DIGPAK interface has about 25 functions, and each function can be invoked via INT 66h. DIGPAK plays unsigned 8- or 16-bit digital sound samples, but DIGPAK has no concept of WAV, VOC, MOD, or any other file format. We must parse any input sound files and send DIGPAK a structure that contains:

  • A pointer to the raw 8- or 16-bit sound data
  • The size of the data in bytes
  • The frequency at which the sample should be played
  • A few diagnostic bytes

For instance, the software interface we're going to create will read VOC files and create the proper DIGPAK structure to play the VOC file after it's loaded from memory.

The primary data structure that DIGPAK uses is called a SNDSTRUC. Here is its definition:

// the DIGPAK sound structure

typedef struct SNDSTRUC_typ
        {
        unsigned char far *sound;  // a pointer to the raw sound data
        unsigned short sndlen;     // the length of the sound data in bytes
        short far *IsPlaying;      // a pointer to a variable that will be
                                   // used to hold the status of a playing
                                   // sound
        short frequency;           // the frequency in hertz that the
                                   // sound should be played at

        } SNDSTRUC, *SNDSTRUC_PTR;

To play a sound, we must first build up this structure and send it to DIGPAK.

Table 6-1: DIGPAK function.
Enlarge
Table 6-1: DIGPAK function.

We’re now ready to see the listing of the different functions that DIGPAK supports. But before we get to that, let’s discuss a couple of points. First, DIGPAK supports real and protected modes; thus it can be used for 32-bit games. However, we are only going to support the 16-bit DOS real mode interface in our discussions. If you are interested in more detail, take a look in the DRIVERS directory at the document named DIGPKAPI.DOC, written by John Ratcliff, for a complete list of functions. The list of functions in Table 6-1 is a subset of DIGPAK, and we will only use a few of those listed, but these are the main functions and in most cases will suffice.

To make using DIGPAK as easy as possible, we’re going to create another API on top of the functions listed in Table 6-1. This will insulate us from making the INT 66h calls by using a clean C interface and a few extra housekeeping functions.

Accessing DIGPAK from C

To make calls to DIGPAK from C, we need to make an INT 66h call with the proper parameters set up in the CPU resistors. We can accomplish this using inline assembly or the _int86( ) function. Inline assembly is cleaner, so we’ll use it. The API that we are going to write will in most cases just pass parameters into the proper registers and call DIGPAK with an INT 66h. However, we must write one function ourselves, and that’s a function to load a VOC file, extract the sound data from it, and set up the DIGPAK sound structure properly. So we need to learn the format of VOC files and how to extract the different components from them.

Figure 6-12: A VOC file.
Enlarge
Figure 6-12: A VOC file.

As shown in Figure 6-12, a VOC file consists of two portions: a header section and a data section. The header section contains variable length information, so the length of the header can vary. Because of this, the 20th byte of the header is always the length of the header section. Using this information, we can always compute the start of the data section. Hence, the 20th byte of the header can be used as an offset from the beginning of the VOC file to find the data section (hey, I didn’t make it up!). The data section contains that actual raw digital data of the sample in either 8- bit or 16-bit form. Since we’re only interested in 8-bit samples, all the VOC files we’ll read will only contain 8-bit unsigned data. Loading a VOC file is similar to reading any other type of file:

  1. The file is opened.
  2. A buffer is allocated
  3. The data is read into the buffer.
  4. The length of the header is computed using the 20th byte in the file.
  5. The data is extracted and placed into the appropriate destination buffer.
  6. The file is closed.

Or something like that...

Since we’re using DIGPAK to play digital sound samples, we must extract a few pieces of information from the VOC file to place into the SNDSTRUC data structure. These are the size of the data portion of the VOC file and the playback frequency or sample rate of the file. These must then be placed, along with a pointer to the actual sound data, into a SNDSTRUC and passed to DIGPAK. To make life a little easier, we’re going to create our own sound structure that will contain, among other things, a DIGPAK SNDSTRUC. The only reason for this is so that we can have a pointer to the start of the VOC file and add any fields in the future to the sound structure, such as information about what game object made the sound and so forth. The sound structure that we will use for our API is called sound and is defined below:

// our high level sound structure
typedef struct sound_typ
        {
        unsigned char far *buffer;  // pointer to the start of VOC file
        short status;               // the current status of the sound
        SNDSTRUC SS;                // the DIGPAK sound structure

        } sound, *sound_ptr;
Figure 6-13: The complete VOC file.
Enlarge
Figure 6-13: The complete VOC file.

As you can see, it doesn’t hold much except a DIGPAK SNDSTRUC, the VOC buffer, and a status variable. However, if we want to add to it in the future, we can; whereas adding to the DIGPAK structure might not be a good idea since adding fields could mess up DIGPAK’s internal addressing! What do you think, John?

We are almost ready to write a program to load a VOC file, but first let’s discuss a couple more details. For one, how do we compute the playback frequency of the VOC file? The sample rate can be found in the VOC file in the beginning of the data section in the “New Voice Block” section. You see, at the beginning of the digital data, there is actually a little header called the “New Voice Block.” This header is only 6 bytes, but nevertheless has some interesting information, such as the sample rate, among other things. Here’s the actual format of a VOC “New Voice Block”:

Byte Number Function
0 Block type
1-3 Block length
4 Sample rate (SR)
5 Pack bytes

The value we’re interested in is located at the 4th byte and is called the sample rate. This byte has the following relationship to the original sample rate:

Sample rate in Hz = -1000000 / (SR - 256)

Hence, by retrieving the 4th byte of the data portion of the VOC file, which starts at an offset defined by the 20th byte in the header, we can compute the sample rate at which the sound effect should be played. This rate is then used to set up the DIGPAK SNDSTRUC. Figure 6-13 shows the relationship between the VOC headers and the VOC data. We are finally ready to write our first API function to load a VOC file off disk into memory and set up all the necessary data structures. Listing 6-1 contains the function.

Listing 6-1 Function to load a VOC file

int Sound_Load(char *filename, sound_ptr the_sound,int translate)
{
// this function will load a sound from disk into memory and pre-format
// it in preparation to be played

unsigned char far *temp_ptr;   // temporary pointer used to load sound
unsigned char far *sound_ptr;  // pointer to sound data

unsigned int segment,          // segment of sound data memory
             paragraphs,       // number of 16 byte paragraphs sound takes up
             bytes_read,       // used to track number of bytes read by DOS
             size_of_file,     // the total size of the VOC file in bytes
             header_length;    // the length of the header portion of VOC file

int sound_handle;              // DOS file handle

// open the sound file, use DOS file and memory allocation to make sure
// memory is on a 16 byte or paragraph boundary

if (_dos_open(filename, _O_RDONLY, &sound_handle)!=0)
   {
   printf("\nSound System - Couldn't open %s",filename);
   return(0);
   } // end if file not found

// compute number of paragraphs that sound file needs
// size_of_file = _filelength(sound_handle);

paragraphs = 1 + (size_of_file)/16;

// allocate the memory on a paragraph boundary
_dos_allocmem(paragraphs,&segment);
// point data pointer to allocated data area
_FP_SEG(sound_ptr) = segment;
_FP_OFF(sound_ptr) = 0;

// alias pointer to memory storage area
temp_ptr = sound_ptr;

// read in blocks of 16k until file is loaded

do
 {
 // load next block
 _dos_read(sound_handle,temp_ptr, 0x4000, &bytes_read);

 // adjust pointer
 temp_ptr += bytes_read;

 } while(bytes_read==0x4000);

// close the file

_dos_close(sound_handle);

// make sure it's a voc file, test for "Creative"
if ((sound_ptr[0] != 'C') || (sound_ptr[1] != 'r'))
   {
   printf("\n%s is not a VOC file!",filename);

   // de-allocate the memory
   _dos_freemem(_FP_SEG(sound_ptr));

   // return failure
   return(0);

} // end if voc file

// compute start of sound data;
header_length = (unsigned int)sound_ptr[20];

// point buffer pointer to start of VOC file in memory
the_sound->buffer       = sound_ptr;

// set up the SNDSTRUC for DIGPAK
the_sound->SS.sound     = (unsigned char far*)(sound_ptr+header_length+4);
the_sound->SS.sndlen    = (unsigned short)(size_of_file - header_length);
the_sound->SS.IsPlaying = (short far *)&the_sound->status;
the_sound->SS.frequency = (short)((long)(-1000000) /
                          ((int)sound_ptr[header_length+4]-256));
// now format data for sound card if requested
if (translate)
   Sound_Translate(the_sound);

// return success
return(1);

} // end Sound_Load

The Sound_Load( ) function takes only three parameters: the file name, a pointer to the sound structure, and a flag indicating whether audio translation is needed. The function of the first two parameters is obvious, but the third parameter, translate, is a bit mysterious. Remember that DIGPAK doesn’t know anything about VOC files; it only knows how to play raw 8- or 16-bit data. This means that if we want DIGPAK to play our audio data correctly and be able to play it more than once from memory without mangling it, then we must tell DIGPAK to translate the data using the MassageAudio function. Once we have done this, DIGPAK will be able to play the digital effect as many times as we wish, and it will sound right.

If you don’t use the MassageAudio function, the original data buffer holding the VOC file in memory may be altered as the sound is played. Hence, you should always MassageAudio when loading a file unless you aren’t going to use DIGPAK to play the sound (which will be the case when we make our own sound card synthesizer later in the chapter). Anyway, let’s review the function.

The first thing you will notice about the function is that it uses DOS-level I/O and memory allocation. This is because it is easier to use DOS I/O than stream I/O to load large chunks into memory, and DOS memory allocation functions always allocate memory on paragraph boundaries, which is a good idea for sound files. Once the sound file is loaded, it’s tested to see if it’s a valid VOC file. This is accomplished by testing the first few characters of the file to see if the word “Creative” is there. If so, the remaining portion of the code computes the length of the sound data and the playback frequency rate, and finally, sets up the appropriate pointers to the sound data structures. Notice that the sound structure fields IsPlaying and status currently don’t do much. The status field is part of our sound structure and can be used to track the status of the sound effect or whatever. The IsPlaying field of SNDSTRUC is used for a status semaphore issued by DIGPAK, but we won’t be using it. The semaphore is used by DIGPAK as a status holder, so it needs to be defined even though we aren’t going to use it.

At the very end of the Sound_Load( ) function is a call to Sound_Translate( ), which is basically a wrapper for the MassageAudio function. The Sound_Translate( ) function is shown in Listing 6-2.

Listing 6-2 Function to translate and process a digital sound effect for playback by DIGPAK

void Sound_Translate(sound_ptr the_sound)
{
// this function calls the DIGPAK function massage audio to translate
// the raw audio data into the proper format for the sound card that
// the sound system is running on.

unsigned char far *buffer;

buffer = (unsigned char far*)&the_sound->SS;

_asm
   {
   push ds          ; save DS and SI on stack
   push si
   mov ax, 068Ah    ; function 3: MassageAudio
   lds si, buffer   ; move address of sound in DS:SI
   int 66h          ; call DIGPAK
   pop si           ; restore DS and SI from stack
   pop ds
                    ; have a nice day :)
   } // end inline assembly

} // end Sound_Translate

The function does nothing more than make a call to DIGPAK’s Function 3, MassageAudio. The only parameter the function needs to send to DIGPAK is the starting address of the SNDSTRUC that contains the sound effect information. This is aliased to a pointer and passed in DS:SI. Notice how SI and DS are saved on the stack since we alter them during the function.

Next, we need a function to unload and deallocate the memory allocated by the Sound_Load( ) function to hold the VOC file. Listing 6-3 contains this function.

Listing 6-3 Function to unload a VOC file

void Sound_Unload(sound_ptr the_sound)
{

// this function deletes the sound from memory

_dos_freemem(_FP_SEG(the_sound->buffer));
the_sound->buffer=NULL;

} // end Sound_Unload

The Sound_Unload( ) function simply releases the memory back to DOS. Notice how the _dos_freemem( ) function is used instead of _ffree( ) (which would normally be used if the memory were allocated by one of the malloc( ) functions rather than DOS). This is a very important point: if you allocate memory with DOS, you must use DOS to deallocate it. The same fact applies for memory allocated with C. However, you can use both types of memory for working buffers and so forth.

We can now load a sound, unload a sound, and translate it to DIGPAK format, so let’s play it! The function to play a sound is one of the simplest, as shown in Listing 6-4.

Listing 6-4 Function to play a digital sound effect

void Sound_Play(sound_ptr the_sound)
{
// this function plays the sound pointed to by the sound structure

unsigned char far *buffer;

// alias sound structure

buffer = (unsigned char far*)&the_sound->SS;

_asm
   {
   push ds         ; save DS and SI on stack
   push si
   mov ax, 068Bh   ; function 4: DigPlay2
   lds si, buffer  ; move address of sound in DS:SI
   int 66h         ; call DIGPAK
   pop si          ; restore DS and SI from stack
   pop ds

   } // end inline assembly

} // end Sound_Play

Interestingly enough, the function body is almost identical to that of Sound_Translate( ). This is a direct result of The Audio Solution’s clean interface to their driver. This function simply sends the pointer of the SNDSTRUC, places the proper command code in AX, and calls INT 66h. The sound will then begin playing. Once the sound begins to play, the next thing we’ll want to do is track its status so we can detect when it’s complete. Thus, a status function is in order. The status function will be based on DIGPAK’s Function 2, SoundStatus, and the code is shown in Listing 6-5.

Listing 6-5 Function to query the status of DIGPAK's digital sound channel

int Sound_Status(void)
{
// this function will return the status of DIGPAK i.e. is a sound playing
// or not

_asm
   {
   mov ax, 0689h ; function 2: SoundStatus
   int 66h ; call DIGPAK

   } // end inline assembly
// on exit AX will be used as the return value, if 1 then a sound is playing
// 0 if a sound is not playing

} // end Sound_Status

The C function calls the SoundStatus function of DIGPAK and returns the result of digital sound channel back in AX. The result codes have been defined in BLACK6.H as:

#define SOUND_STOPPED    0      // no sound is playing
#define SOUND_PLAYING    1      // a sound is playing

For example, if we started a sound and wanted to wait for it to complete before continuing, the following line of code would suffice:

while(Sound_Status());

The final function we need is a way to stop a sound while it’s playing so that we can start another or simply turn off all sound output. Stopping a sound while it’s playing is accomplished using DIGPAK’s Function 8, StopSound. Listing 6-6 contains our C version of the function.

Listing 6-6 Function to stop a sound effect while it's playing

void Sound_Stop(void)
{
// this function will stop a currently playing sound

_asm
   {
   mov ax, 068Fh    ; function 8: StopSound
   int 66h          ; call DIGPAK

   } // end inline assembly

} // end Sound_Stop

The Sound_Stop( ) function stops a digital sound effect (if one is playing) and does nothing if a sound isn’t playing. In general, DIGPAK is fairly forgiving about calling functions when their results can’t possibly make sense, so our code doesn’t have to be totally bulletproof! We now have a complete C interface to play digital sounds using DIGPAK. You’ll notice that we didn’t implement a function for everything; for example, we left out volume control, but you can add these later, if you wish. Let’s review how everything fits together. First, we use the SETD.EXE program to create the digital sound driver SOUNDRV.COM and then load it at the DOS prompt. Then DIGPAK installs itself and hooks into INT 66h. Next, to play a sound using C, we would include BLACK6.H and BLACK6.C and write a bit of code like this:

sound boom;   // the sound

// load the VOC file

Sound_Load("boom.voc",(sound_ptr)&boom,1);

// play the sound
Sound_Play((sound_ptr)&boom);

// wait for sound to complete
while(Sound_Status());

//  delete the sound
Sound_Unload((sound_ptr)&boom);

It doesn’t get any better than that!

As an example, I have created a demo program that uses a technique called digital concatenation to create complete phrases out of smaller words. This technique is used when you call the operator for time and she tells you the time. The actual sounds you hear are concatenated. For example, we could digitize the vocalizations of:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100

Then we could concatenate the individual numerical vocalizations to create any number from 0 to 100. In essence, we can synthetically generate 101 numerical phrases from only 29 digital samples! For instance, 26 would be created by playing 20, then 6, in quick succession.

Anyway, the following demo uses this technique to create a verbal “Speak and Add” program that tests your skills of addition. The name of the program is DIGIDEMO.C and the executable is DIGIDEMO.EXE. To create the executable, you will need the previous library functions of BLACKLIB along with BLACK6.H and BLACK6.C. Moreover, you will have to generate the DIGPAK driver SOUNDRV.COM with the utility program SETD.EXE and load the driver from the command line before running DIGIDEMO.EXE. Listing 6-7 contains the source code.

Listing 6-7 A digital sound demo program

// DIGIDEMO.C - Digital sound demo

// I N C L U D E S ///////////////////////////////////////////////////////////

#include <io.h>
#include <conio.h>
#include <stdio.h>
#include <stdlib.h>
#include <dos.h>
#include <bios.h>
#include <fcntl.h>
#include <memory.h>
#include <malloc.h>
#include <math.h>
#include <string.h>

// include our the sound library

#include "black3.h"
#include "black6.h"

// D E F I N E S /////////////////////////////////////////////////////////////

// defines for the phrases

#define SOUND_WHAT       0
#define SOUND_WRONG      1
#define SOUND_CORRECT    2
#define SOUND_PLUS       3
#define SOUND_EQUAL      4

// G L O B A L S /////////////////////////////////////////////////////////////

sound ones[11];   // this will hold the digital samples for 1..9
sound teens[11];  // this will hold the digital samples for 11-19
sound tens[11];   // this will hold the digital samples for 10,20,30,40...100
sound phrases[6]; // this will hold the phrases

// F U N C T I O N S /////////////////////////////////////////////////////////

void Say_Number(int number)
{
// this function uses digitized samples to construct whole numbers
// note that the teens i.e. numbers from 11-19 have to be spoken as a
// special case and can't be concatenated as the numbers 20-100 can!

int ones_place,
    tens_place;

// compute place values, use simple logic, more complex logic can be
// derived that uses MODING and so forth, but better to see whats going
// on. However, the main point of this is to see the digital library
// in action, so focus on that aspect.

// test for 0..9

if (number<10)
   {
   Sound_Play((sound_ptr)&ones[number]);
   while(Sound_Status()==1);
   return;
   } // end if 0..9

// test for 11..19

if (number >= 11 && number <= 19)
   {
   Sound_Play((sound_ptr)&teens[number-11]);
   while(Sound_Status()==1);
   return;
   } // end if 11..19

// now break number down into tens and ones

tens_place = number / 10;
ones_place = number % 10;

// first say tens place

Sound_Play((sound_ptr)&tens[tens_place-1]);
while(Sound_Status()==1);

// now say ones place (if any)

if (ones_place)
   {
   Sound_Play((sound_ptr)&ones[ones_place]);
   while(Sound_Status()==1);
   } // end if

} // end Say_Number

// M A I N ///////////////////////////////////////////////////////////////////

void main(int argc, char **argv)
{

char filename[16]; // used to build up filename

int number,    // loop variables
    number_1,
    number_2,
    answer,
    done=0;    // exit flag

float num_problems=0,  // used to track performance of player
      num_correct=0;

// load in the samples for the ones

for (number=1; number<=9; number++)
    {
    // build the filename

    sprintf(filename,"N%d.VOC",number);
    printf("\nLoading file %s",filename);

    // load the sound

    Sound_Load(filename,(sound_ptr)&ones[number],1);

    } // end for ones

// load in the samples for the teens

for (number=11; number<=19; number++)
    {
    // build the filename

    sprintf(filename,"N%d.VOC",number);
    printf("\nLoading file %s",filename);

    // load the sound

    Sound_Load(filename,(sound_ptr)&teens[number-11],1);

    } // end for teens

// load in the samples for the tens

for (number=10; number<=100; number+=10)
    {
    // build the filename

    sprintf(filename,"N%d.VOC",number);
    printf("\nLoading file %s",filename);

    // load the sound

    Sound_Load(filename,(sound_ptr)&tens[-1+number/10],1);

    } // end for tens

// load the phrases

printf("\nLoading the phrases...");

Sound_Load("what.voc",   (sound_ptr)&phrases[SOUND_WHAT     ],1);
Sound_Load("wrong.voc",  (sound_ptr)&phrases[SOUND_WRONG    ],1);
Sound_Load("correct.voc",(sound_ptr)&phrases[SOUND_CORRECT  ],1);
Sound_Load("plus.voc",   (sound_ptr)&phrases[SOUND_PLUS     ],1);
Sound_Load("equal.voc",  (sound_ptr)&phrases[SOUND_EQUAL    ],1);

// main event loop, note this one is not real-time since it waits
// for user input!

printf("\n                       S P E A K    N    A D D  !!!");
printf("\n\n\nThis program will test your skills of addition while demonstrating");
printf("\nthe digital sound channel in action!");

printf("\n\nJust answer each addition problem. To exit type in 0.\n");
printf("\n\nPress any key to begin!!!\n\n");

getch();

// the main event loop

while(!done)
     {

     // select two random numbers to add, but make sure their sum is less
     // than or equal to 100

     do
       {

       number_1 = 1 + rand()%99;
       number_2 = 1 + rand()%99;

       } while (number_1 + number_2 > 100);

     // ask user question

     printf("\nWhat is ");

     Sound_Play((sound_ptr)&phrases[SOUND_WHAT]);
     while(Sound_Status()==1);

     printf("%d",number_1);
     Say_Number(number_1);
     printf(" + ");

     Sound_Play((sound_ptr)&phrases[SOUND_PLUS]);
     while(Sound_Status()==1);

     Time_Delay(15);

     printf("%d",number_2);
     Say_Number(number_2);
     printf(" = ?",number_2);

     Sound_Play((sound_ptr)&phrases[SOUND_EQUAL]);
     while(Sound_Status()==1);

     scanf("%d",&answer);

     // make sure user isn't exiting

     if (answer!=0)
        {
        num_problems++;

        // test if user is correct

        if (answer == (number_1 + number_2))
           {
           Sound_Play((sound_ptr)&phrases[SOUND_CORRECT]);
           while(Sound_Status()==1);
           num_correct++;
           }
        else
           {
           // oops wrong answer!

           Sound_Play((sound_ptr)&phrases[SOUND_WRONG]);
           while(Sound_Status()==1);

           Say_Number(number_1+number_2);
           Time_Delay(25);

           } // end else wrong

        } // end if
     else
        {
        done = 1;
        } // end else user is exiting

     } // end main event loop

// unload all the sounds

for (number=1; number<=9; number++)
    Sound_Unload((sound_ptr)&ones[number]);

for (number=11; number<=19; number++)
    Sound_Unload((sound_ptr)&teens[number]);

for (number=10; number<=100; number+=10)
    Sound_Unload((sound_ptr)&tens[-1+number/10]);

for (number=0; number<=4; number++)
    Sound_Unload((sound_ptr)&phrases[number]);

// tell user his/her statistics

printf("\nYou got %.0f percent of the problems correct.",100*num_correct/num_problems);

} // end main

The program is actually very simple. It begins by loading all the prerecorded digital samples of the individual numbers along with a few phrases, such as “What Is?”. After this, the program enters the main event loop and selects two random numbers such that their sum is less than or equal to 100. Then a complete sentence is concatenated, based on the individual numerical phrases, and spoken. Then the user must type in the correct answer. The heart of the program is the function Say_Number( ), which basically computes the tens, ones, and teens places of the requested number, accesses the sound arrays and builds up the number, and plays the sounds. The main concepts to extract from the program are the ability to load multiple sounds, play sounds, and use digital sounds creatively.

Finally, we need to discuss the topic of memory consumption. Digital samples are notorious for eating memory alive. For example, if we were to sample 10 seconds at 44.1 kHz (CD-quality), we would eat up 440K of memory! What does this mean? It means that digital samples should be short, sweet, and to the point. Moreover, they should be sampled at a rate low enough so they don’t use all of the system memory. Of course, you can always swap sounds in and out from disk and create a virtual sound buffer system, but if you want to fit everything into the DOS 640K model, don’t get too crazy. Use a sample rate such as 6 to 8 kHz, which will be more than enough for most sound effects such as explosions, growls, and laser blasts.

Digital Recording Techniques

We now have a great digital sound library and interface, but where do we get the actual digital samples? That’s a good question. There are many commercial products that can digitize samples. If you purchase a Sound Blaster for example, the distribution disks come with a couple utilities for DOS and a WAV player for Windows that allows you to take digital samples. The Windows WAV player also allows sound files to be written out in either VOC or WAV format.

Personally, for simple stuff, I use a program called Blaster Master. The program is Shareware and designed for a Sound Blaster. It allows you to digitize samples up to 25 seconds in 8-bit, 16-bit, mono, or stereo. Also, the program can read and write multiple sound formats. The program is on the CD-ROM under the directory called BMASTER. To digitize sounds, you’ll need a sound source such as a mic or other input that is connected to the input of your sound. Using Blaster Master’s menudriven interface, you can digitize a sample and tweak it until it sounds good. Then you can write the sound file to disk and use it in your programs.

Looks like we’ve got digital sound down, but what about music? Well that’s the hard part, so wave your hand in the air and prepare to conjure up some real Cybersorcery!

Making Music with MIDPAK

MIDPAK is the MIDI music interface that we will use to generate music. We use interrupt 66h to communicate with the interface, and just as we did for DIGPAK, we are going to create a software layer on top of MIDPAK to make the calls to it a bit simpler. Before we begin designing the software interface layer, there is a small detail about the MIDPAK file format that needs to be discussed. MIDPAK does not use standard MIDI files for its input, but a special file format designed by Miles Design called Extended MIDI or XMI files. Let’s take a look at how these files are generated.

Creating Extended MIDI files

Figure 6-14: Operation of MIDIFORM.EXE.
Enlarge
Figure 6-14: Operation of MIDIFORM.EXE.

Extended MIDI files are nothing more than a collection of MIDI files all concatenated in a single file. Using this format it becomes much easier to manipulate music in a game, since a single file contains all the music for the game. The XMI file is said to contain sequences. Each sequence is really a complete MIDI file. Hence, if we were to create an XMI file that had three MIDI songs within it, we would refer to the first one as sequence one, the second one as sequence two, and the third MIDI file as sequence three. However, due to the numbering system of MIDPAK, they would really be 0, 1, and 2 in the program code.

So how do we create XMI files? The MIDPAK kit comes with a program named MIDIFORM.EXE. You’ll find this program on the CD-ROM in the DRIVERS directory. Its syntax is very simple. In the most basic form it is used to create a single XMI file from a set of standard MIDI files. MIDIFORM has the following syntax:

MIDIFORM outputfile.XMI inputfile1.MID inputfile2.MID . . .

Figure 6-14 shows what’s going on with the following command:

MIDIFORM MUSIC.XMI INTRO.MID BATTLE.MID END.MID

Here MIDIFORM is being used to make a single XMI file that contains all the music for a single game. The introduction music, the main game music, and the ending music are all placed in a single XMI file. MIDPAK refers to each of these compositions as a sequence. For example, if we wanted to play the ending music, that would be sequence 2. We will learn how to “register” an XMI file with MIDPAK and play a sequence, but now we know the overall structure of the system.

To reiterate, MIDPAK doesn’t play MIDI files directly, we must convert the MIDI files into a single XMI file. The XMI file can contain one or more MIDI files, and each individual composition is referred to as a sequence and is numbered starting from 0. We are now in a position to write the interface, so let’s see what MIDPAK has to offer.

The MIDPAK Interface

Table 6-2: MIDPAK functions
Enlarge
Table 6-2: MIDPAK functions

MIDPAK contains a set of functions similar to DIGPAK except they are used to play music instead of digital sounds. There are about 23 functions in MIDPAK; some are obsolete and others we won’t even be using. Table 6-2 lists the most important ones.

There are quite a few MIDPAK functions missing from the list, but they’re used to implement advanced features such as a polled driver, event triggering, and related subfunctions. If you want more detail, feel free to review the file on the CDROM named MIDPKAPI.DOC, which contains the complete functional listing. Admittedly, the only thing I find of interest other than the functions in Table 6-2 are the event triggering functions. You see, many times in a game, we want to be able to determine where we are in the musical sequence so that an appropriate special effect can be displayed or logical decision can be made. Special MIDI commands called SYSEXs (system exclusive) take care of this. MIDPAK supports this, but it is beyond the scope of these pages. However, with some determination and a bit of detective work, you should be able to figure out how it works.

Now that we have the set of function we are going to use, let's implement a C interface to access them in a simple manner.

Accessing MIDPAK from C

Unlike DIGPAK, MIDPAK has no “music structure.” MIDPAK simply needs to know the starting address of the XMI data that represents the desired musical sequences in memory. Hence, all we need to do is write a function that loads an XMI file and implement some C shells around all the MIDPAK functions, and we’re done! To play an XMI sequence is almost too easy–here are the steps:

  1. An XMI file is generated with MIDIFORM.EXE.
  2. The XMI file is loaded into memory.
  3. The XMI file is “registered” to MIDPAK using Function 5, RegisterXmidi.
  4. A selected sequence is played by calling Function 3, PlaySequence.

Step 2 is about the only thing we must write ourselves; the rest of the functions are simply calls to INT 66h.

We’re ready to write the interface library, but let’s add a data structure and some defines to make things easier. If you recall, MIDPAK doesn’t support any data structure as does DIGPAK, but to make life easier, we’ll create one that contains relevant information about a loaded XMI file. Here’s the data structure we are going to use:

// this holds a midi file

typedef struct music_typ
        {

        unsigned char far *buffer;  // pointer to midi data
        long size;                  // size of midi file in bytes
        int status;                 // status of song
        int register_info;          // return value of RegisterXmidiFile

        } music, *music_ptr;

The structure holds a pointer to the loaded XMI file, its length, a status variable, and a field that holds the results when the XMI file is registered to MIDPAK. The structure is probably overkill, but better safe than sorry. To further facilitate the ease of writing the functions and determining their results, here are a few defines to make our code clearer:

// return values for the midi sequence status function

#define SEQUENCE_STOPPED     0   // the current sequence is stopped
#define SEQUENCE_PLAYING     1   // the current sequence is playing
#define SEQUENCE_COMPLETE    2   // the current sequence has completed
#define SEQUENCE_UNAVAILABLE 0   // this sequence is unavailable

// these return values are used to determine what happened when a midi file
// has been registered

#define XMIDI_UNREGISTERED  0 // the midi file couldn't be registered at all
#define XMIDI_BUFFERED      1 // the midi file was registered and buffered
#define XMIDI_UNBUFFERED    2 // the midi file was registered, but was too
                              // big to be buffered, hence, the caller
                              // needs to keep the midi data resident
                              // in memory

The first set of defines is used to determine the results of the SequenceStatus function, and the second set is to make sense of the results of the RegisterXmidiFile function. All right, now that we have laid the groundwork, let’s begin with the first and most important function, and that’s to load the XMI file. We’ll use DOS memory allocation and file functions as we did in the DIGPAK interface simply because I copied the functions and modified them! Listing 6-8 contains the function to load an XMI file off disk into memory and set up the music data structure appropriately.

Listing 6-8 Function to load an XMI file into memory

int Music_Load(char *filename, music_ptr the_music)
{
// this function will load a xmidi file from disk into memory and register it

unsigned char far *temp_ptr;    // temporary pointer used to load music
unsigned char far *xmidi_ptr;   // pointer to xmidi data

unsigned int segment,           // segment of music data memory
             paragraphs,        // number of 16 byte paragraphs music takes up
             bytes_read;        // used to track number of bytes read by DOS

long size_of_file;              // the total size of the xmidi file in bytes

int xmidi_handle;               // DOS file handle

// open the extended xmidi file, use DOS file and memory allocation to make sure
// memory is on a 16 byte or paragraph boundary
if (_dos_open(filename, _O_RDONLY, &xmidi_handle)!=0)
   {
   printf("\nMusic System - Couldn't open %s",filename);
   return(0);

   } // end if file not found

// compute number of paragraphs that sound file needs
size_of_file = _filelength(xmidi_handle);
paragraphs = 1 + (size_of_file)/16;

// allocate the memory on a paragraph boundary
_dos_allocmem(paragraphs,&segment);

// point data pointer to allocated data area
_FP_SEG(xmidi_ptr) = segment;
_FP_OFF(xmidi_ptr) = 0;

// alias pointer to memory storage area
temp_ptr = xmidi_ptr;

// read in blocks of 16k until file is loaded
do
 {
 // load next block
 _dos_read(xmidi_handle,temp_ptr, 0x4000, &bytes_read);

 // adjust pointer
 temp_ptr += bytes_read;

 } while(bytes_read==0x4000);

// close the file
_dos_close(xmidi_handle);

// set up the music structure
the_music->buffer = xmidi_ptr;
the_music->size   = size_of_file;
the_music->status = 0;

// now register the xmidi file with MIDPAK

if ((the_music->register_info = Music_Register(the_music))==XMIDI_UNREGISTERED)
   {
   // delete the memory
   Music_Unload(the_music);

   // return an error
   return(0);

   } // end if couldn't register xmidi file

// else return success
return(1);

} // end Music_Load

The function takes two parameters, a file name and a pointer to a music structure. The function then proceeds to open the XMI file, determine its length, and load the file into memory. If there are any problems during this phase, the function returns an error. If all goes well, then the function sets the appropriate fields in the music structure and registers the XMI file with MIDPAK. After that, the function returns success. The registering of the XMI data is accomplished by a call to Music_Register( ), which simply calls the MIDPAK function to register an XMI file by address. The code for Music_Register( ) is shown in Listing 6-9.

Listing 6-9 Function to register an XMI file

int Music_Register(music_ptr the_music)
{

// this function registers the xmidi music with MIDPAK, so that it can be
// played

unsigned int xmid_off,   // offset and segment of midi file
             xmid_seg,
             length_low, // length of midi file in bytes
             length_hi;

// extract segment and offset of music buffer
xmid_off = _FP_OFF((the_music->buffer));
xmid_seg = _FP_SEG((the_music->buffer));

// extract the low word and high word of xmidi file length
length_low = the_music->size;
length_hi  = (the_music->size) >> 16;

// call MIDPAK

_asm
   {
   push si           ; save si and di
   push di
   mov ax,704h       ; function #5: RegisterXmidi
   mov bx,xmid_off   ; offset of xmidi data

   mov cx,xmid_seg   ; segment of xmidi data

   mov si,length_low       ; low word of xmidi length
   mov di,length_hi        ; hi word of xmidi length

   int 66h                 ; call MIDPAK
   pop di                  ; restore si and di
   pop si
   
   } // end inline assembly

// return value will be in AX

} // end Music_Register

Music_Register( ) is actually quite simple except for the few calculations done to extract the low and high words of the necessary parameters. There are cleaner ways to do this, but this is the clearest (I think). Feel free to simplify the function if you wish. Once an XMI file has been loaded into memory, sooner or later it must be unloaded. This is done with a simple call to deallocate the memory. But remember, we used DOS to allocate the memory, so DOS must be used to free it. Listing 6-10 contains the function to release the XMI memory and unload the file.

Listing 6-10 Function to unload an XMI file

int Music_Play(music_ptr the_music,int sequence)
{
// this function plays an xmidi file from memory

_asm
   {
   mov ax,702h        ; function #3: PlaySequence
   mov bx, sequence   ; which sequence to play 0,1,2....
   int 66h            ; call MIDPAK

   } // end inline assembly

// return value is in AX, 1 success, 0 sequence not available

} // end Music_Play

The function takes as parameters a pointer to the music structure and the sequence number to play (starting from 0). The code makes a call to the PlaySequence function, and that’s about it. Amazing, huh? Four lines of assembly language to play a MIDI song. But remember, there are thousands of lines under all this that are doing the real work, so be glad we are using MIDPAK!

Once a sequence is playing, we may wish to stop it. A function that accomplishes this task is shown in Listing 6-12.

Listing 6-12 Function to stop a sequence

void Music_Stop(void)
{
// this function will stop the song currently playing

_asm
   {
   mov ax,705h   ; function #6: MidiStop
   int 66h       ; call MIDPAK

   } // end inline assembly

} // end Music_Stop

The function takes no parameters since only one sequence can be played at a time; therefore, it’s obvious which one is to be stopped. After stopping a sequence, it’s possible to resume the music with the function shown in Listing 6-13.

Listing 6-13 Function to resume a stopped sequence

void Music_Resume(void)
{
// this function resumes a previously stopped xmidi sequence
_asm
{
mov ax,70Bh ; function #12: ResumePlaying
int 66h ; call MIDPAK
} // end inline assembly
} // end Music_Resume

Again, Music_Resume( ) needs no parameters since the sequence to resume is implied as the one that was stopped. The last function we need to implement is a status function so we can determine what’s going on with a running sequence. Listing 6-14 contains the implementation of the SequenceStatus function.

Listing 6-14 Function to determine the status of a sequence

int Music_Status(void)
{
// this function returns the status of a playing sequence

_asm
   {
   mov ax,70Ch   ; function #13: SequenceStatus
   int 66h       ; call MIDPAK

   } // end inline assembly

// return value is in AX

} // end Music_Status

The function takes no parameters, but returns a status code as defined by the status codes in BLACK6.H. This completes our music interface. As an example, let’s use the functions in a pseudoprogram to load and play an XMI file. Here’s the code:

music the_music; // the XMI structure

// load an XMI file
Music_Load("song.xmi",(music_ptr)&the_music);

// start the music playing, play sequence 0

Music_Play((music_ptr)&the_music,0);

// wait for song to end
while(Music_Status() == SEQUENCE_PLAYING);

// unload the XMI data i.e. release the memory
Music_Unload((music_ptr)&the_music);

Of course, we could stop the music and resume it with the extra function, but the above fragment indeed plays a song and exits when it’s complete. As a real example of using MIDPAK, I have created a program called MIDIDEMO.EXE. The program is menu driven and allows you to load XMI files from DOS, play sequences, stop the music, and resume the sequences. The program and its source (MIDIDEMO.C) are on the CD-ROM. To create an executable, you’ll have to link the program with the module BLACK6.C and all the previous modules accordingly. Listing 6-15 contains the source for the program.

Listing 6-15 A complete program to load XMI files and manipulate sequences

// MIDIDEMO.C - MIDI Music Demo

// I N C L U D E S ///////////////////////////////////////////////////////////

#include <io.h>
#include <conio.h>
#include <stdio.h>
#include <stdlib.h>
#include <dos.h>
#include <bios.h>
#include <fcntl.h>
#include <memory.h>
#include <malloc.h>
#include <math.h>
#include <string.h>

// include our the sound library

#include "black3.h"
#include "black6.h"

// M A I N ///////////////////////////////////////////////////////////////////

void main(int argc, char **argv)
{

char filename[16];  // the .XMI extended midi filename

int done=0,    // main loop exit flag
    loaded=0,  // tracks if a file has been loaded
    sequence,  // sequence of XMI extended midi file to be played
    select;    // the users input

music song;    // the music structure

// main event loop

while(!done)
     {
     // print out menu

     printf("\n\nExtended MIDI Menu\n");
     printf("\n1. Load Extended MIDI file.");
     printf("\n2. Play Sequence.");
     printf("\n3. Stop Sequence.");
     printf("\n4. Resume Sequence.");
     printf("\n5. Unload Extended MIDI File from Memory.");
     printf("\n6. Print Status.");
     printf("\n7. Exit Program.");

     printf("\n\nSelect One ?");

     // get input

     scanf("%d",&select);

     // what does user want to do?

     switch(select)
           {
           case 1:  // Load Extended MIDI file
                {
                printf("\nEnter Filename of song ?");
                scanf("%s",filename);

                // if a song is already loaded then delete it

                if (loaded)
                   {
                   // stop music and unload sound

                   Music_Stop();
                   Music_Unload((music_ptr)&song);

                   } // end if loaded

                // load the new file

                if (Music_Load(filename,(music_ptr)&song))
                   {
                   printf("\nMusic file successfully loaded...");

                   // flag that a file is loaded

                   loaded = 1;

                   }
                else
                   {
                   // error

                   printf("\nSorry, the file %s couldn't be loaded!",filename);

                   } // end else

                } break;

           case 2: // Play Sequence
                {
                // make sure a midi file has been loaded

                if (loaded)
                   {
                   printf("\nWhich Sequence 0..n ?");
                   scanf("%d",&sequence);

                   // play the requested sequence

                   Music_Play((music_ptr)&song,sequence);

                   } // end if loaded
                else
                   printf("\nYou must first load an extended MIDI file.");

                } break;

           case 3: // Stop Sequence
                {
                // make sure a midi file has been loaded

                if (loaded)
                    Music_Stop();
                else
                   printf("\nYou must first load an extended MIDI file.");

                } break;

           case 4: // Resume Sequence
                {
                // make sure a midi file has been loaded

                if (loaded)
                    Music_Resume();
                else
                   printf("\nYou must first load an extended MIDI file.");

                } break;

           case 5: // Unload Extended MIDI File from Memory
                {
                // make sure a midi file has been loaded

                if (loaded)
                   {
                   Music_Stop();
                   Music_Unload((music_ptr)&song);
                   loaded=0;
                   }
                else
                   printf("\nYou must first load an extended MIDI file.");

                } break;

           case 6: // Print Status
                {

                printf("\nMIDIPAK Status = %d",Music_Status());

                } break;

           case 7: // Exit Program
                {
                // delete music and stop

                if (loaded)
                   {
                   Music_Stop();
                   Music_Unload((music_ptr)&song);
                   loaded=0;
                   }

                done=1;

                } break;

           default:
                  {
                  printf("\nInvalid Selection!");
                  } break;

           } // end switch

     } // end while

// unload file if there is one

if (loaded)
   Music_Unload((music_ptr)&song);

} // end main

The program is nothing more than a menu system shell around our functions. To run the program, you’ll have to use one of the XMI files on the CD-ROM or create one with some of the MIDI files of your own via MIDIFORM.EXE. I suggest you get a feel for the process. And of course, for the demo to operate, you must have loaded MIDPAK.COM into memory, which will in turn load MIDPAK.ADV and MIDPAK.AD.

Real-Time Sound Processing

At this point we’re going to diverge a bit from creating sound and music and talk about how to manipulate and mix sounds. Sound cards exist that perform some of these operations for us, but then again, we will have to perform some of them ourselves, so it’s nice to understand the underlying principles. We’re going to study the two most basic sound processing operations, which are amplitude amplification and mixing.

Amplification and Attenuation

[Image:BA3D-6-15.gif|thumb|Figure 6-15: Effects of distance on sound amplitude.]] A digitized sound is sampled such that its amplitude spectrum is contained in either 8 or 16 bits. Hence, the sound has 8 or 16 bits of amplitude resolution. There are many times during a game that we might want to amplify or attenuate a digitized sound. For example, if an object is far away from the player’s viewpoint and has a sound attached to it, the sound should be barely audible, but if the object closes nearer, the sound should become louder. Figure 6-15 shows this graphically. We see that the nearer sound has the same waveform as the more distant sound, but its amplitude is larger.

This amplitude amplification is accomplished simply by taking the original sound and multiplying each element of the data by a constant and placing the data in a destination buffer. Then the destination buffer contains the new amplified sound. For example, say that buffer[ ] contains a sound sample that is 8000 samples composed of unsigned char. To amplify the sound, we could do something like this,

unsigned char buffer[8000], destination[8000]; // these will hold the sounds

for (index=0; index<8000; index++)
    destination[index] = (unsigned char)( (float)buffer[index]* amplitude + .5);

where amplitude is a floating point number.

The above fragment can be used to both amplify the sound if amplitude is greater than 1.0, or to attenuate it if amplitude is less than 1.0. The only problem with this method is that the calculations must be done with some precision; hence, floating point calculations of some sort must be used and the actual calculations may take a few milliseconds, which may be unacceptable in a game. Of course, if the amplification or attenuation factor is always a whole number, then floating point calculations can be avoided completely.

Using this amplification, we could write a function that uses a digitized sound as the input source and, based on an amplification factor, generates a new sound with the requested amplitude. A function of this sort would be useful in a 3D game to simulate 3D sounds occurring due to distance on sound cards that don’t support hardware volume adjustments.

Mixing Sounds

Figure 6-16: Mixing sounds with software.
Enlarge
Figure 6-16: Mixing sounds with software.

Mixing is the next interesting operation that can be done to digital sounds. Mixing involves taking one or more inputs and summing them into a single output. Figure 6-16 shows this graphically. Here we see N input sounds all being summed and the result placed into a single destination buffer. There are some factors to consider when summing or mixing sounds. The first is whether to use a weighted average or just blindly sum the sounds and clip the results to fit into bytes or words (depending on the sample rate).

Mathematically speaking, it may seem more correct to weigh each sound. For example, if two sounds were to be added together, the following code could be used to implement equal weighting:

for (index=0; index<length_of_sound; index++)
    destination[index] = (unsigned char)( (float)sound_1[index]*.5 +
                         (float)sound_2[index] * .5 + .5);

Since there are two sounds, each sound is weighted by 50 percent or 0.5. Even though this may seem like the correct thing to do, the result doesn’t sound correct. Amazingly enough, to add two or more sounds together, the results are better if the sounds are simply added without weighting, for example:

for (index=0; index<length_of_sound; index++)
    destination[index] = (sound_1[index]+sound_2[index]);
Figure 6-17: Mixing sounds of different lengths.
Enlarge
Figure 6-17: Mixing sounds of different lengths.

Note that the length of the destination must be the size of the largest sound, and during the mixing of the two sounds, zeros should be used to pad for the shorter sound. This is shown in Figure 6-17. The length of the larger sound is determined, and then a for loop sums the two sounds together until the data for the shorter sound has been consumed. Then the remainder of the longer sound is simply copied into the end of the destination buffer. Here’s a code fragment that adds two sounds together without weighting while it takes into consideration different lengths:

// test for same length

if (length_1==length_2)
   {
   for (index=0; index<length_1; index++)
         destination[index] = sound_1[index]+sound_2[index];
   } // end if equal lengths
else
if (length_1>length_2)
   {

   for (index=0; index<length_2; index++)
        destination[index] = sound_1[index] + sound_2[index];

   // now copy the rest of sound 1 into buffer

   for (; index<length_1; index++)
        destination[index] = sound_1[index];

   } // end if sound 1 is longer than sound 2
else
   {
  // sound 2 must be longer than sound 1

   for (index=0; index<length_1; index++)
        destination[index] = sound_1[index] + sound_2[index];

   // now copy the rest of sound 2 into buffer
      for (; index<length_2; index++)
           destination[index] = sound_2[index];

   } // end else sound 2 is longer than sound 1

Using the above techniques, you can create all kinds of interesting effects, such as echoes, reversing, and others. Digital sounds are a lot of fun to experiment with!

Sound Scheduling

We’re getting closer to generating our own sound, but we need to cover the topic of scheduling. Simply put, the sound card is a finite resource, and it can only play one digital effect and one MIDI song at once (some cards have multiple digital and MIDI channels, but most don’t). Hence, some sort of system must be devised to allow for contingencies and resource sharing among all the sounds and music in a game. For example, I’m sure you’ve played a game and blown something up, and just as the sound of the explosion was at its peak, it just stopped! This is because the sound was preempted by something else, maybe another object in the game or even your own laser blast. This situation is unfortunate, but it does happen. To ease the amount of lost and preempted sounds, some sort of sound scheduler is used for both music and effects. This sound scheduler is something like a tiny operating system kernel that can queue up sounds, play sounds, preempt a playing sound for one of a higher priority, and “age” sounds that are waiting in the queue to determine if there’s still enough time to play them anymore. I’m not going to supply you with a sound scheduler, since the software will vary from game to game, but I will give you some ideas and a few designs so you can make your own. Here is a list of the different classes the schedulers fall into:

  • Fully preemptive
  • Priority-based preemption
  • Queued priority-based preemption
  • Real-time queued priority based multiple synthesized digital channels

Fully Preemptive

A fully preemptive sound system is the simplest to implement. The basic idea is that when a new sound is to be played, the currently playing sound (if any) is stopped and the new sound is started. This technique is actually very common for low-end games. Implementing the system is as easy as playing the next sound with the Sound_Play() function, without any regard for the status of the sound system.

Priority-Based Preemption

A priority-based preemptive sound system operates under the premise that each sound has a specific priority attached to it. For example, the sound of a droning engine in the background may have a low priority, such as 10, while the sound of a laser blast may be a 5, and finally, the sound of an explosion may be a 1 (highest priority). Hence, as a sound is played, its priority is recorded in a global variable. Then when another effect is to be played, its priority is compared to that of the currently playing sound. If the new sound’s priority is greater (that is, numerically less), the current sound is preempted and the new sound is started.

Using priority-based preemption is a vast improvement over total preemption, but still lacks any form of memory. For example, if a short sound is playing and one of slightly less priority is requested to be played, the second sound will be thrown away; even though in a few more cycles the initial shorter sound might have been completed. A solution to this is queuing up sounds and playing them when it’s their turn.

Queued Priority-Based Preemption

We can improve the priority-based preemption by adding a small memory queue that queues up the requested sounds. For example, using the preemptive technique, if the program requests a sound to be played that has the same priority of the currently playing sound, then either the new sound is played or it isn’t. A better solution would be to place the request in a queue and wait for the current sound to complete. After its completion, the queue is referred to and the waiting sound is played. The only problem with this technique is that a sound will be played some time in the future from when it was actually started. This can lead to lag. For example, if the player fires a missile, we surely can’t queue up a missile sound and play it three seconds later!

To avoid this dilemma, a second part of the queue’s logic should age each of the entries, and if they haven’t been played within a specific number of game frames, remove them from the queue. For example, if a sound of a car driving in the distance was queued, it could be aged a long time without much trouble, but the sound of an explosion or something within view of the player must be started within a few frames of its initial request, otherwise, it must be removed from the queue.

A queued system supplemented with priority and aging performs very well, but the ultimate solution to the problem is to synthesize a set of virtual digital channels from the single channel that most sound cards are equipped with.

Real-Time Queued Priority-Based Multiple Synthesized Digital Channels

We have already seen how to mix sounds to create an output that is the sum of the input sounds. This concept can be taken to an extreme and used to synthesize a set of virtual digital channels from one physical channel. Basically the system is implemented by adding a sound to the currently playing sound; in other words, by adding the requested sound data in real-time to the data that is being played.

For example, imagine that a sample is played by the sound card and another effect is to be played. The system at this point determines where in the data stream the current sound is being played, say location 5300. Then the software takes the remainder of the currently playing sound and adds it to the newly requested sound in real-time. The result is placed into a buffer, and the buffer containing the mixed sounds is played. Thus, the buffer contains the data of the previously playing sound summed with the new sound. Hence, through software, we have synthetically created two channels from one.

The major drawback to this technique is that it does take time to mix the two sounds. This introduces a small time lag along with the possibility (depending on architecture) that the destination buffer will have to be replicated to juggle the memory properly. However, when we couple virtual channels along with queuing techniques, a complex and rich sound system can be created.

We have spoken of digital sound, MIDI music, The Audio Solution, and we have generated a complete library to do our bidding. Now it’s time for some real Cybersorcery.

Building your own Digital Sound Device--DIGIBLASTER

Figure 6-18: System layout of DIGIBLASTER.
Enlarge
Figure 6-18: System layout of DIGIBLASTER.

A true Cybersorcerer not only knows the ways of software, but the ways of hardware. By understanding the hardware within a computer, we can better understand how to write software to take advantage of it. Therefore, we are going to build a little project that will plug into the PC’s parallel ports and generate stereo digital sound! We’ll call it DIGIBLASTER, and it will prove to be a fun project as well as an illuminating experience. Take a look at Figure 6-18 to see a general layout of DIGIBLASTER.

Referring to the figure, we see that the hardware is plugged into both parallel printer ports of the PC. If your particular PC doesn’t have two ports, you will only need to build half of the hardware, but your DIGIBLASTER won’t be stereo—but that’s OK. DIGIBLASTER works by sending a stream of bytes through each of the parallel ports. These data streams are sent at the rate at which we wish the sound data to be played. The sound data is the standard 8-bit unsigned VOC data we have seen before. Hence, we will use the parallel ports as output devices to control our little synthesizer.

Figure 6-19: Output bitmapping of parallel ports.
Enlarge
Figure 6-19: Output bitmapping of parallel ports.

How will our synthesizer work? Here’s the deal. We can communicate with each parallel port via a set of I/O ports. These I/O ports allow us to write bytes out to the parallel ports and receive status from each port. But we’ll only be using the output abilities of each port. Take a look at Figure 6-19 to see a bitmap layout of the parallel port. Notice that each port can be accessed by retrieving the actual port address from location [0000:0408 + parallel_port_number]. We’ll discuss this in more detail in a moment. However, the main point is that we can write a byte out to each parallel port. This byte will be represented by a set of voltages on pins 2 through 9 of the female DB-25 connector on the parallel port itself. Each one of these voltages will be relative to ground, which is on pins 18 through 25 (we will use pin 25).

So what does all this mean? If you remember our discussion of digital sound generation, you will recall that digital sounds are generated by converting a stream of digital words into an analog signal using a digital-to-analog (D/A) converter. Therefore, if we place a D/A converter on the output of each parallel port and connect a pair of headphones to the outputs, any data we send to the parallel ports will be converted to analog voltages, and the headphones will vibrate accordingly. Now here comes the interesting part. If we send the data from a VOC file out to the parallel port at a high enough rate (the rate at which the VOC was sampled), we’ll actually hear the sound recorded, and presto, we have a digital sound reproduction system!

Of course, the hard part of this design is the electronics to convert the digital 8- bit words into analog voltages, but as it turns out, it’s not too difficult. Let’s take a look.

The Hardware

Figure 6-20: Physical layour of DIGIBLASTER.
Enlarge
Figure 6-20: Physical layour of DIGIBLASTER.

To convert the digital outputs of each of the parallel ports, we are going to use a simple resistive ladder network of resistors. Figure 6-20 shows the complete design of the system and the values of all the components. The circuit works as follows: When a binary word is sent to the port, the bits d7 through d0 (pins 2 through 9) will either have a +5 or a +0 volts on them. These voltages will be converted into currents by each of the resistors R0 through R7. Each of the currents developed by the voltages is proportional to the resistor values. For example, the d7 bit has the highest magnitude weight; therefore, the resistor connected to it has the smallest value, since the smaller the resistor, the larger the current (which is what we want).

By connecting resistors to each of the outputs on the parallel port, we in essence generate a current that is proportional to the applied binary word (see Figure 6-21). This will be true only if the resistors themselves have values that are in powers of 2. In the case of DIGIBLASTER, I have selected resistors with values 100, 200, 400, 800, 1.6k, 3.2k, 6.4k, and 12.8k ohms. You won’t actually be able to find most of these values unless you buy precision resistors, but purchase the values that are as close as possible.

Figure 6-21: The digital-to-analog conversion.
Enlarge
Figure 6-21: The digital-to-analog conversion.

The output currents that flow through each resistor are summed, and then the final current is passed through the diode D1, as shown in Figure 6-22. The diode is for protection. It allows current to flow only from the parallel port out, not the other way around. This is a good idea since the speaker or headphones that we connect the circuit to can develop a current (within the voice coil) and send it back through the port, and, as Bill Murray said in Ghostbusters, “This is a bad thing!”

Figure 6-22: Current feedback protection diode.
Enlarge
Figure 6-22: Current feedback protection diode.

The next stage of the circuit is the volume control and filter. The capacitor C1 is used to filter out the “hiss” and white noise that is normally heard with a circuit of this kind. Basically, the capacitor creates a low-pass filter network with a single pole (for those of you who are EEs). Figure 6-23 shows the results of the filter on the sound’s frequency spectrum. The final portion of the circuit is used to attenuate the signal into the headphones and is basically a variable resistor called a potentiometer. It functions by means of a conducting slider moving over a semicircular wedge of carbon. As the potentiometer knob is turned, the slider moves, making contact with the carbon wedge at different points. The overall effect is a change in resistance, since the resistance of any conductor is proportional to its length. Hence, as the potentiometer is adjusted, the output current will develop a larger or smaller voltage when applied to the headphones, thus acting as a volume control.

Figure 6-23: The low-pass filter section.
Enlarge
Figure 6-23: The low-pass filter section.

The final analog signal from each parallel port is received by the headphones by tying the commons (grounds) of each port together and into the ground of the headphones, along with connecting the right channel of the headphones to the output of one parallel port and the left channel to the output of the other parallel port.

As you build the hardware, take your time and be careful not to cause a short when you make connections to the male DB-25 connector. If you have never built anything in your life, you will need:

  • A soldering iron (apporximately $5.00)
  • Some solder (approximately $3.00)
  • Some 30-31 gauge wire (approximately $2.50)
  • All the parts listed in Figure 6-20
Figure 6-24: Prototype of DIGIBLASTER.
Enlarge
Figure 6-24: Prototype of DIGIBLASTER.

The circuit and construction will take and hour or two and, with a bit of luck, should work on the first try. Once you have put the whole thing together, it should look something like Figure 6-24, which is my prototype.

Now that we have the hardware (hopefully, let's write some software to control the DIGIBLASTER.

The Software

 the parallel ports and their access numbers.
Enlarge
the parallel ports and their access numbers.

Controlling the DIGIBLASTER is actually quite simple. We simply need to load a VOC file and send the audio data portion of it (a byte at a time) out to the parallel ports. However, to be intelligible, the data must be sent at the same rate at which the sample was taken. The major problem with all this is creating an exact time base with a resolution in kHz. You see, if the VOC file was sampled at 8 kHz, we have to write a function that sends the VOC data to the parallel port at the same rate. This can be done using tight timing loops or the internal timer, but is beyond the scope of this discussion. Hence, we will simply place a variable time delay in the function to slow it to any speed, and as the data is written to the port, the variable can be adjusted for reasonable results.

Since the DIGIBLASTER connects to two parallel ports, you will of course need two parallel ports. If you only have one, everything will still work, but the sound will be mono instead of stereo. The main function that we need to write is one that can write a byte to one of the four parallel ports defined in Table 6-3.

A while ago we learned that each parallel port can be accessed by using the data at memory location [0000:0408+parallel_port_number] as the I/O port address. To make this access easy, we are going to define some constants and a parallel base pointer like this:

// the parallel port offsets

#define PAR_1_PORT      0
#define PAR_2_PORT      1
#define PAR_3_PORT      2
#define PAR_4_PORT      3

And we are going to define a pointer to access the I/O port of any parallel port:

unsigned int far *par_port = (unsigned int far*)0x00000408L;

Using the above declarations, let’s write a function that takes as parameters the parallel port we wish to write to along with the data we want written:

void Par_Write(int port,unsigned char data)
{
// this function writes a byte to one of the parallel ports
_outp(*(par_port+port),data);

} // end Par_Write

That looks easy! The dereferencing may seem a bit confusing, but it basically says, “Look into location 0000:0408 - 0000:040B for the I/O address of the parallel port and then write data to it.” With the above function in hand, we are 90 percent done. All we need now to play a VOC file is to load it and send the data out to both the right and left channels, which will be two of the possible parallel ports LPT1 through LPT4. For example, say that you have parallel ports LPT1: and LPT3: on your PC. In that case, a fragment that would blast a VOC file to those specific ports would look like this:

for (index=0; index<length_of_sound_data; index++)
     {
     Par_Write(PAR_1_PORT, buffer[index]);
     Par_Write(PAR_3_PORT, buffer[index]);
     } // end for

Of course, the above fragment would probably only be heard by ants since the data is written out too fast! To slow it down, we should place a delay into the loop that forces each byte to be written slower. Here is a possible solution:

for (index=0; index<length_of_sound_data; index++)
     {
     Par_Write(PAR_1_PORT, buffer[index]);
     Par_Write(PAR_3_PORT, buffer[index]);

     // wait a bit
     for (time=0; time<delay; time++);

     } // end for

Unfortunately, there is no hard-and-fast rule to relate the delay factor to the playback frequency, since it will be different depending on the speed of your particular PC. But you can always use another more complex technique to ensure the playback frequency. Also, just as a bit of trivia, I have found that the maximum bandwidth of the parallel port is about 100K per second, which is more than ample for CD-quality playback rates.

As an example of using the DIGIBLASTER, I have created a little demo that uses LPT1: for the right channel, LPT2: for the left channel, and plays a pair of VOC files through each channel into your headphones. The demo slowly pans the volume of each channel from right to left, simulating a moving sound source. This leads us into the next topic, 3D sound. The name of the demo is 3DIGI.EXE and its source is 3DIGI.C. The program uses the function Sound_Load( ) from our DIGPAK interface library to load the sound, so you will have to link BLACK6.C and the previous library modules to create an executable. In any case, the source code is shown in Listing 6-16.

Listing 6-16 Program to drive DIGIBLASTER

// 3DIGI.C - A simple 3-D sound demo using the parallel ports and D/A
// convertors

// I N C L U D E S ///////////////////////////////////////////////////////////

#include <io.h>
#include <conio.h>
#include <stdio.h>
#include <stdlib.h>
#include <dos.h>
#include <bios.h>
#include <fcntl.h>
#include <memory.h>
#include <malloc.h>
#include <math.h>
#include <string.h>

#include "black3.h"
#include "black6.h"

// D E F I N E S /////////////////////////////////////////////////////////////

// the parallel port offsets

#define PAR_1_PORT     0
#define PAR_2_PORT     1
#define PAR_3_PORT     2
#define PAR_4_PORT     3

// G L O B A L S ////////////////////////////////////////////////////////////

// pointer to base parallel port

unsigned int far *par_port = (unsigned int far*)0x00000408L;

// the parallel ports to be used for right and left

int right_3D = PAR_1_PORT,
    left_3D  = PAR_2_PORT;

// the amount of power going to each channel (i.e. the volume)

float power_3D = .5; // each channel is at 50%, this will range 0..1

// F U N C T I O N S ////////////////////////////////////////////////////////

void Par_Write(int port,unsigned char data)
{

_outp(*(par_port+port),data);

} // end Par_Write

///////////////////////////////////////////////////////////////////////////////

void Sound_Play_3D(sound_ptr the_sound,float power_right, float power_left,int speed)
{

// this function will play a digitized sound through each of the parallel
// ports. Each channels output is controlled by the input powers. Note that
// a local replica of the sound is made so that the attenuated version
// can be computed in real-time without destroying the original

unsigned int freq,      // used to slow or speed up playback rate
             index,     // looping variable
             size;      // size of sound data

unsigned char far *right_buffer; // working buffers
unsigned char far *left_buffer;

// create new sound based on right and left power

size   = the_sound->SS.sndlen;

right_buffer = (unsigned char far *)_fmalloc(size);
left_buffer  = (unsigned char far *)_fmalloc(size);

// create the new sounds, one for each channel

for (index=0; index<size; index++)
    {

    // compute proper value based on power level

    right_buffer[index] =
             (unsigned char)((float)the_sound->SS.sound[index] * power_right);

    left_buffer[index] =
             (unsigned char)((float)the_sound->SS.sound[index] * power_left);

    } // end for index

     // play the sound

     for (index=0; index<size; index++)
         {

         // delay a bit to slow frequency down

         for (freq=0; freq<speed; freq++)
             {
             // write left channel

             Par_Write(left_3D,left_buffer[index]);

             // write right channel

             Par_Write(right_3D,right_buffer[index]);

             } // end time delay frequency control

         } // end for index

// release the memory

_ffree((void far *)right_buffer);
_ffree((void far *)left_buffer);

} // end Sound_Play_3D

// M A I N ///////////////////////////////////////////////////////////////////

void main(int argc, char **argv)
{

unsigned int index,    // looping variable
             done=0,   // exit flag
             delay=5;

sound effect_r,effect_l;  // the sound effects

float power_delta = .1;

// load the sound test sound effect into memory without translation

if (!Sound_Load("3dleft.voc",(sound_ptr)&effect_l,0))
   {
   printf("\nCouldn't load test sound 3DLEFT.VOC");
   return;

   } // end if

if (!Sound_Load("3dright.voc",(sound_ptr)&effect_r,0))
   {
   printf("\nCouldn't load test sound 3DRIGHT.VOC");
   return;

   } // end if


// display menu

printf("\n3D DIGITAL SOUND DEMO\n");
printf("\nUse the <S> and <F> keys to slow down and speed up the sound.");
printf("\nPress <Q> to exit.\n");

// enter event loop

while(!done)
     {

     // test which size sound source is on

     if (power_3D > .5)
        Sound_Play_3D((sound_ptr)&effect_r,power_3D, (1-power_3D),delay);
     else
        Sound_Play_3D((sound_ptr)&effect_l,power_3D, (1-power_3D),delay);

     // pan sound from right to left left to right
     // make you dizzy!

     power_3D+=power_delta;

     if (power_3D > 1 || power_3D < 0)
        {
        power_delta=-power_delta;

        power_3D+=power_delta;

        } // end of reverse pan

     // test if user is hitting keyboard

     if (kbhit())
        {
        // what does user want to do

        switch(getch())
              {

              case 'f': // speed up the sound
                   {
                   if (--delay < 1)
                      delay=1;

                   } break;

              case 's': // slow down the sound
                   {
                   ++delay;

                   } break;

              case 'q': // exit
                   {
                   done=1;
                   } break;

              } // end switch

        } // end if kbhit

     } // end while

// unload the sounds

Sound_Unload((sound_ptr)&effect_l);
Sound_Unload((sound_ptr)&effect_r);

} // end main

The program begins by loading the two VOC files 3DRIGHT.VOC and 3DLEFT.VOC. These are played depending on the virtual position of the sound source. The program then proceeds to print out a little menu and enters the main event loop. The event loop continually plays the sound and determines if the user has pressed any keys. The user can either speed up or slow down the rate at which the sounds are played. The main workhorse of the program is the function Sound_Play_3D( ), which is a bit of a misnomer. It actually doesn’t know anything about 3D space; it only knows how to amplify a sound and create two copies of it that are each amplified or attenuated versions of the original. This amplification is based on a variable called power_3D, which is the amount of power that is sent to each channel.

When power_3D is equal to 1, all the power goes to the right side; when power_3D is equal to 0, all the power goes to the left side. Therefore, by varying power_3D in a cyclical manner, it will seem as if the sound is “walking” around your head. The main point to get out of this demo is how the sounds are played and amplified. The 3D portion of it is rather simplistic and was added just for fun. A final note: if your PC doesn’t support parallel LPT1: and LPT2:, simply change the variables 3d_right and 3d_left to the appropriate port constants.

With the DIGIBLASTER in hand, it is actually possible to create a 3D sound and play it! Let’s see how…

Implementing 3D Sound

Figure 6-25: Comparison of general 3D sound and 3D sound in a plane.
Enlarge
Figure 6-25: Comparison of general 3D sound and 3D sound in a plane.

What is 3D sound? It is the sound that we hear in everyday life. When someone is behind us, we hear him differently than someone in front of us, or at any other position for that matter. Also, we can detect when sounds move closer or farther. Unfortunately, the sounds produced by most video games are not 3D, but mono. Even if the sounds are stereo, each channel doesn’t replicate the actual sound your right and left ears would hear if the sounds were 3D.

True 3D sound is so real, you can put on a pair of headphones and you’ll think whatever you’re listening to is right in front of you! The use of 3D sound is most prevalent in virtual reality systems. Since players must move around in Cyberspace, methods had to be devised to allow them to hear the objects as they would really be heard.

Implementing full 3D sound is complex at very best. An understanding of Fourier analysis, digital signal processing, and calculus is needed just to make a start! But in essence, 3D sound is created by synthesizing what a human’s right and left ears would hear if the 3D world and sounds were real. Computing these sounds in the general case is much too complex, so we’ll try for something a little less ambitious: 3D sound in a 2D plane.

Figure 6-25 shows a comparison between the two cases, general 3D sound and 3D sound in a plane. As you can see, the only information we need to determine the position of a 3D sound (which is in the same plane as the listener) is the position of the sound (that is, the angle between the sound and the listener) and the distance between the sound and the listener. Using this information and basic physics, we should be able to come up with a crude model of the 3D sound as it would be heard by the listener for any position and distance, as long as both the sound and listener lie in the same plane.

Figure 6-26: The planar 3D sound model.
Enlarge
Figure 6-26: The planar 3D sound model.

The model we are going to create will be oversimplified but will produce decent results. Let’s begin by taking another look at the system model, as depicted in Figure 6-26. The only data we have to go on are the angle between the listener’s view direction (which is perpendicular to the direction of both ears) and the distance between the sound and the listener. Let’s start with the distance.

In general, as a sound moves away, it becomes fainter. Therefore, we should attenuate the amplitude of the computed strength of the right and left channels proportional to the overall distance between the sound source and the listener. A first draft of the sound function might look like,

final_right_output_power = K*distance*right_power(angle);
final_left_output_power  = K*distance*left_power(angle);

where K is simply a constant, and the functions right_power( ) and left power( ) return the relative power that each ear hears as a function of the angle between the sound source and the listener.

The determination of the functions right_power( ) and left_power( ), if done correctly, would be enough to fill an entire chapter. Hence, we are going to make some approximations. First, we are going to assume that the listener is a single point and has no dimensional volume. Second, we are also going to assume that sounds behind the listener sound the same as sounds in front of the listener. These approximations are brutal, but necessary if we want to make any progress.

Now the problem boils down to computing the intensity of an energy wave on a single point. Furthermore, the point is modeled with two mutually opposing sound sensors that are directed perpendicularly to the listener’s view direction. To compute the amount of energy, we will use the following hypothesis: If a sound is totally to the right of the listener, then only the right ear will be able to hear it; if a source is totally to left of the listener, then only the left ear will hear it. If the sound source is directly in front or behind the listener, then each ear will hear the sound equally well. Finally, if the sound source is somewhere in between, the power or loudness heard by each ear will be directly proportional to the angle of the sound with respect to the view direction of the listener.

Taking all this into consideration, we can write the following formulas to compute the right and left strengths of the sounds,

right_power(angle) = (angle/180)
left_power(angle)  = (1 - (angle/180))

where angle varies from 0 to 180.

You will notice that each channel will vary from 0 to 1.0. Working the above formulas into the previous distance calculations, the final results are:

final_right_output_power = K*distance*(angle/180)
final_left_output_power  = K*distance*(1 - (angle/180))

Of course, constants must be computed as well as the angle and distance between the sound source and listener, but the results will yield quite impressive 3D planar response. At least it will be good enough so that the player can determine the relative distance of an object and its relative right or left position. Granted, the sounds may not really sound 3D, but they will have the elements of 3D sound. Just as a 3D game only looks 3D, our approximation will only give some 3D hints.

So what can we use 3D sound trapped in a plane for? We can use it for a whole lot! In a tank game for instance, we could use it to make the tanks and other objects seem like they are moving around not only visually, but auditorily. This adds a whole new dimension to games. In many 3D games sounds become louder as objects near, but few games implement 3D panning at all–even in a plane. The main reason for this is that the player must wear headphones or have a stereo with speakers directly behind him. However, I feel that requesting a player to plug a set of headphones into the sound card is a small price to pay for the added realism produced by 3D sound techniques.

Of course, the problems of amplifying or attenuating sound effects without hardware support is a time-consuming problem. Nevertheless, the amount of time needed to create the proper amplitude-modulated versions of a specific sound for each right and left channel is usually negligible on a fast machine.

Summing up, 3D sound in a plane can be modeled by a simplified approximation of the physical system using only the position and angle of the sound source relative to the listener. True, a real model would take into consideration details such as reflection, refraction, frequency response, group velocity, absorption, Doppler shift, resonance, and air density/temperature. But we’re making video games not space shuttles, right?

Remote Sound FX

Figure 6-27: Setup for a linked game.
Enlarge
Figure 6-27: Setup for a linked game.

Remote sound FX only have meaning when two or more computers are linked by means of a modem or network of some kind. Imagine two players, each at a different location, playing a linked game of some sort (see Figure 6-27). Each local machine is running the same game software, and the games are running synchronously using techniques we will learn about later. Now imagine that the players want to talk to each other. How can they do this? Well, one solution would be to let them type to each other and send the message to the remote machine over the phone line along with the other game information.

Wouldn’t it be better if the players could actually hear each other talking or at least hear a voice that represented the opponent? This is the concept behind remote sound FX. Of course, there are many different levels we can take this concept to, but the simplest implementation has a set of sound FX on both machines. When the remote player presses a button, it directs the remote machine to play the effect. For example, each machine could have a set of evil phrases, such as “die alien,” “eat photons,” “there can be only one!”, and so forth. Then by pressing a button on one machine, a message would be sent to the remote machine directing it to play one of the effects. Using this technique, the players could have a crude dialogue with a set of prerecorded phrases.

We can take the technology one step further and add personalization to the voice. For example, during the setup phase of the game, each player could be asked to speak a set of phrases. When complete, the data for the phrases would be sent to the remote machines, giving each remote machine a local copy of the digitized effects. Thus, when a player wanted to say something to the opponent, the opponent would hear the actual voice of his adversary!

Such techniques are good and make a game ten times more fun, but let’s face it: the ultimate remote sound FX would have a voice recognition system on each machine that had a fairly large vocabulary (maybe 100 words). Then as a player spoke, messages would be sent to the remote machine directing it to play the digitized words. Hence, sentences could be spoken by each of the remote players. Of course, all these techniques use very little network bandwidth since each remote machine has a local copy of the sound effect in memory. However, if your network has a high enough bandwidth, it is feasible to send the actual voice signal!

In conclusion, remote sound FX are very cool and very easy to implement. You can rest assured that the game we construct at the end of the book will have them!

Voice Recognition

Figure 6-28: Voice prints for the nautical directions.
Enlarge
Figure 6-28: Voice prints for the nautical directions.

Before concluding the chapter, we should at least touch upon a technology that has been talked about for years, but never implemented well enough to be a valid input method. This is voice recognition technology. Voice recognition is basically a method of comparing a human’s voice against a voice print of some kind that is in memory. For years this technology has been riddled with problems associated with the amount of memory needed for the voice samples, the inability to recognize any speaker’s voice, and the advanced hardware needed to do voice recognition in realtime.

Now, however, voice recognition is a reality and can be done with most advanced sound cards containing onboard DSP processors. Both the Sound Blaster series and Media Vision products have add-on components that allow their cards to recognize speech in a fairly reasonable manner. The recognition still can’t quite run in real-time and is speaker dependent, but it’s a start. Hopefully, we’ll start seeing games that use simple voice recognition, such as weapons commands, numbers, direction, and other small phrases, very soon.

As Cybersorcerers, we would like to use a library that performs voice recognition for us for the most popular sound cards, so we don’t have to write the software ourselves. However, I haven’t seen anything like this thus far (if you do, let me know). Hence, we’re going to discuss some of the mathematics behind voice recognition and some of the problems, which should allow you to implement a crude voice recognizer yourself, or at least have an idea where to begin.

Recognizing voice in real-time is much too complex to start with; let’s simply try to design a recognizer that first listens to the word and then decides if the word is in its vocabulary. We are going to make a single word recognizer.

Figure 6-28 shows the waveforms for a set of words that represent the four directions, north, south, east, and west. These waveforms were sampled at 12 kHz. Now all we have to do is record a user speaking a single word and then compare his input with the four voice prints. This is where the trouble begins. First, assuming that we sample at the same rate, what happens if the input word is slightly longer or shorter than any of the voice prints? There wouldn’t be an easy way to compare each value of the input word with the four voice prints. Moreover, this kind of comparison is practically futile.

Figure 6-29: The problem of averaging samples.
Enlarge
Figure 6-29: The problem of averaging samples.

When humans speak, they very seldom reproduce exactly the same word. There are always small anomalies. Therefore, we must use some sort of statistical approach to comparing the samples. For example, we could average the voice prints and average the input word and compare. However, this won’t work because averaging time domain or time-dependent data loses the temporal aspects of the information, which is very important! Take a look at Figure 6-29. Here we see two voice prints of two words. The words themselves are irrelevant, but the fact of the matter is that the average of the data of each word is the same, so the words are no longer distinguishable.

Figure 6-30: Averaging sectors of a sound to maintain temporal information.
Enlarge
Figure 6-30: Averaging sectors of a sound to maintain temporal information.

A better solution would be to average smaller chunks of the voice print as a function of time. For example, if a voice print took .5 seconds, maybe we would average it in 50 millisecond sections. Figure 6-30 shows this process graphically. Now if an input voice print was averaged the same way, a comparison would make more sense. However, the problem of different length samples still exists. What if the speaker waits 100 milliseconds before speaking? The input voice sample would be averaged and the first chunk would average to zero. Even though the rest of the sample may be a perfect match with one of the voice prints, a match couldn’t be found due to phase shifts. Thus, we must also take phase shifting into consideration and try to preprocess the input voice sample to make sure that we have the “meat” of it before we start the averaging process.

The above technique actually works if each voice print is very dissimilar and the input voice samples are spoken clearly and at the same rate as the voice prints. But what about amplitude? This brings another whole dimension to our problems. The input voice print may be an exact match with one of the words in the vocabulary, but it may not have the same amplitude. How do we deal with this problem? The simplest solution is to first normalize the input data so that all voice prints have maximum amplitude relative to themselves. Then when a comparison is done, the amplitudes are already normalized to, say, 255 so that the averaging makes sense.

Figure 6-31: The frequency spectrum of the nautical directions.
Enlarge
Figure 6-31: The frequency spectrum of the nautical directions.

Using these techniques, I have implemented many voice recognizers that use a Sound Blaster or a simple A/D converter to listen to a word and then compare the input sample to a set of prerecorded voice prints (usually less than 10). The technique isn’t perfect–it works about 50 to 80 percent of the time. To improve voice recognition matching, other techniques can be used that give more information about the sample. One such method is based on frequency spectrum analysis. You see, we have only been looking at the amplitude spectrum of the samples. However, frequency spectrum is much better since it’s like a fingerprint. For example, Figure 6-31 shows what the average frequency spectrums of the words “north,” “south,” “east,” and “west” might look like. Notice that they are different. The frequency spectrums will differ based on the speaker and the words; therefore, coupling frequency spectrum information along with amplitude information, a recognizer can be made that works consistently in the 90 percent range on a single voice.

Alas, everything we have tried so far is not very smart. A voice recognition system should try to figure out what you’re saying even if you’re not saying it correctly. Moreover, it should be able to learn as you speak. Implementing these techniques can be done with the help of neural nets, sparse distributed memory, and similar fuzzy technologies. This is where the final solution of real-time voice recognition lies. However, with a Sound Blaster or similar card in hand and the basic techniques we have outlined, you should be able to write a simple recognizer that understands a few words—at least enough to arm weapons and fire!

Summary

In this chapter we have covered topics that are truly mysterious to the world. We can now command our sound cards to play digital samples and MIDI music as easily as we can plot a pixel. Furthermore, we learned some basic sound processing techniques as well as how sounds should be scheduled in a game. After that, we dove off the deep end and built some hardware. We used the elements of carbon, aluminum, and copper (just as ancient alchemists used in their spells) to create a sound synthesizer that plugs into the parallel ports of the PC! Then we used the hardware to get a glimpse of 3D sound and its properties. Finally, we touched upon a couple of concepts that aren’t new, but aren’t old either. Those, of course, are remote sound FX and voice recognition—techniques that can turn a video game into an engrossing auditory experience. Now it’s time to play God and bring the characters in our games to life.

Continue


Copyright 2006 Andre LaMothe