This chapter covers general theory on sound - what sound is, how sound is stored digitally, and an overview of 3D sound effects.
Sound results from a mechanical vibration that travels through some medium (such as air, or a table, or the ground). This vibration is transmitted via the collisions of molecules against one another. Our ears contain sensors that pick up these vibrations, which our brains then interpret as sound.
A sound's vibration can be represented as a waveform. The structure of the sound waveform determines how the sound will be perceived. The structure of a wave is measured in terms of frequency (or wavelength) and amplitude. Sound waves with a short wavelength (high frequency) create high-pitched sounds, while sound waves with longer wavelengths (low frequency) create low-pitched sounds. The amplitude (size) of the wave determines how loud the sound is.
Sound as a wave
Sound travels through air at approximately 300 meters per second. Frequency is measured in Hertz (Hz), which is basically the number of vibrations per second of the molecules. The average human ear can pick up sounds within a range of 20 to 20,000 Hertz. (Thus a 20 Hz sound has a wavelength of (300 * (1/20)) = 15 meters!)
The human ear actually contains thousands of individual tiny hair-like sensors, each tuned to slightly different frequencies within the 20 to 20,000 Hz range. Different frequency sounds thus cause different sensors to vibrate. When the sensors vibrate, their movements are converted into neural (electrical) impulses and sent to the brain.
A physical sound wave is an "analog" source, because the range of possible wave amplitudes is continuous (smooth). A real-world analog signal thus has an infinite range of possible amplitude values. This is, uh, difficult to store in a fixed-size sample in computer memory. Instead, discrete ("stepped") values are used to approximate the original waveform. There are two things that need to be considered when converting and storing a sound in digital form. Firstly, one must decide on the rate at which samples are stored, that is, the number of measurements of the wave amplitude that are saved per second. Typical "sampling frequencies" are 11, 22 and 44 kHz. Secondly, we need to decide how to store each sample. A 16-bit integer can hold approximately 64000 discrete values (actually 65536). The amplitude of a sound sample would thus be scaled into the range -32000 to 32000.
The following diagram shows the effect of digitally sampling an analog signal. The black curve shows the actual sound wave. The red lines are straight lines drawn from discrete sampled points. The resultant signal is an approximation of the original signal. In this simplified example, there are only 11 discrete possible amplitude values that can be stored for each sample.
Analog sound form, with corresponding digital representation
Audio CDs use 16-bit samples (65536 possible values). For stereo, two separate samples are stored.
Real sounds, such as a human voice, are a complex mix of different frequencies and amplitudes. The waveform of an actual sound recording (the author saying 'hi': hi.wav) is shown below. The second screenshot shows one hundredth of a second's worth of the "hi" recording.
Waveform of the author saying 'hi' (hi.wav)
Portion of the waveform of the author saying 'hi', zoomed in to 1/100th of a second's worth
Mathematically, it can be shown that to accurately represent a wave of frequency n, you need to store samples of that sound at a minimum rate of 2n. This is known as the Nyquist limit. Since the average human cannot hear sounds above 20000 Hertz, if we want to accurately store a sound digitally with negligible quality loss, we need to sample at a rate of at least 40000 Hertz. This is why audio CDs are usually stored at 44 kHz (note that this Hz is the rate per second at which samples of the sound waveform amplitude are recorded, not to be confused with the sound frequency itself). Sampling the sound at less than 40 kHz will cause noticeable degradation of the sound sample.
There are a number of file formats in use for storing sound. One of the most common is .wav format; other notable formats are .mp3 and .ogg.
Sound data can compress quite well. The MP3 file format, which is known as a "lossy" compression format because some information is lost during compression, typically obtains compression ratios of around 10:1. In spite of the high compression ratios and the fact that the compression is lossy, most people cannot hear the difference between an MP3 and its original recording.
Having two ears allows the brain to potentially determine the location of a sound source - that is, we can usually tell more or less where a sound is coming from. Generally there are two main methods that our brain uses to determine the location of a sound source. The first is the time-difference method: a sound coming from one side will take slightly longer to reach the ear on the other side, and our brains can detect this. The second is the intensity-difference method: a sound from one side typically reaches the ear on that side directly (and is thus clear), but when it reaches the ear on the other side, it has travelled through one's head, and will thus be softer and more muffled.
Sometimes our ability to accurately determine the direction a sound is coming from is hindered by factors such as the sound bouncing off walls, e.g. if something blocks the sound from directly reaching our ears, we may instead hear primarily the sound bouncing off something else.
Sound waves emitted from a source that is moving towards the listener will be "compressed", and thus have a higher frequency. If the source is moving away from the listener, the sound waves 'stretch out', and the frequency will be lower. This creates a noticeable effect and can be heard for example when a car or motorbike drives past a person, especially at high speed. This effect is known as the Doppler effect, and is illustrated in the following figure:
Note that the Doppler effect also applies to a stationary sound source if the listener is moving.
DirectSound can simulate the Doppler effect, if given the position and velocity of the sound source and the listener.