Film soundtracks are usually made available in either "Stereo" or "5.1 Surround" sound, although other possibilities exist. Quite a few of the source sound recordings I've been using are "binaural" recordings, which sound eerily realistic over earphones, but often less impressive when played back on speakers. What does this stuff mean, and how can I use free software tools to make the most of it? This will be an ongoing learning experience, but I want to start with a brief description of these most common technologies, and how they are supported by the file formats we have available to us: Vorbis, FLAC, and WAV.
Making Movies with Free Software
This article is part of an on-going series on the challenges I've faced in producing two free-licensed movies, Marya Morevna, through the Morevna Project and Lunatics, which we are working on as Anansi Spaceworks.
This column may seem a little abstract for Free Software Magazine, but it provides a launching off point (and a technical introduction) for a number of topics I hope to cover in upcoming. In order to process surround sound with free software tools, it's first important to understand what surround sound is, and why we might want to use it. An intriguing step along the way is true "binaural" sound, and how it differs from ordinary "stereo" sound.
The path to "surround sound"
The first sound recordings were "mono" -- that is to say, they just recorded a single waveform, representing frequency and volume information, but no directionality. This was fine for many purposes, but it was a little weak when it came to things like music.
Human hearing is three-dimensional. We can distinguish the direction and, to some degree, distance of a sound source
Human hearing is three-dimensional. We can distinguish the direction and, to some degree, distance of a sound source. In fact, there's a wealth of information in the sounds that reach our ears, and our brains do some very sophisticated processing of that information.
I think most people realize that the main way that "stereo" sound works is by having sound come at different volumes in each ear -- if it's louder in your left ear (or coming out of the left speaker), then it seems to be coming from that direction. And vice-versa. That's called "panning", we say that the sound signal has been "panned" to the left speaker.
This is very popular, and works quite well. It's also very easy to do in a mixing console or in a mixing application like audacity -- you just change the relative left and right amplitudes of the waveforms for different elements you are going to record. In this way, you can take a bunch of independent mono recordings (of individual instruments, for example) and spread them out across the left-right spectrum.
To the sensitive ear though, this sounds slightly "off" or "flat", though, and you might wonder why that is.
Another approach is to actually record in stereo, with double mics, simulating your ears. Some people go so far as to put the mics on a dummy-head to simulate the effects of your head when you are listening (and yes, it does alter the sound). This kind of recording is called "binaural" (two-ear) recording, and if you have a chance, you should listen to some of these recordings on ear-phones.
Here's a few examples from Wikimedia Commons of binaural recordings which you might want to listen to:
- A working 17th century water mill at Weald and Downland Open Air Museum
- Two Skyballs bouncing indoors
- Sounds from a pool table
- Paper sounds and whispering demo
If you listen to these via earphones and close your eyes, I think you'll be impressed by how complete the auditory scenes appear in your head -- you're able to localize the sounds much better. Simple stereo panning cannot do this.
There are actually a number of subtle physical processes going on. Some of them have to do with the way sound goes around and/or through your head as it gets to the "far ear" from the sound source. This can cause frequency filtering effects or echoes. You also pick up echos and reverberation or ambience from the walls of the room in which a recording is made.
There are actually a number of subtle physical processes going on
But by far the most significant effect (other than volume, which panning models) is the phase change. Sound is fairly slow (at least compared to light), and there is a significant time delay between sound arriving at the nearer and further ears. This time delay causes the waveforms to be time-shifted relative to each other. Your brain is very sensitive to this information (which is a marvel of neurobiology and evolution, but we're just going to take it for granted here), and you interpret these changes as spatial information.
Unfortunately, binaural sounds really only work through earphones. Once you put them on large speakers in a room, most of the subtlety is lost, and it's hard to tell them apart from ordinary panned stereo sound. That's because the sound no longer enters your ears unaltered -- instead it bounces around the room, past your furniture, reverberates off the walls, and so on, leaving the sensitive phase information muddled. Instead of hearing the complex sound visualization of the recording, you are instead sensitive to the sound visualization of your room.
What to do?
What's with the "5.1"?
One way is to add more speakers. Various configurations have been tried over the years, from 3 speakers all the way up to 10 speakers, but the most popular by far is a 6 speaker arrangement called "5.1 surround" sound.
In this arrangement, there are still "Left" and "Right" speakers, which are also positioned in front of you (typical to either side of the screen for video), but there are also others: a "Center" speaker in front (right behind the screen), and then "Left Surround" and "Right Surround" speakers that are positioned behind you. That takes care of the "5".
Finally, the ".1" is a low-frequency effect ("LFE") channel which goes to a sub-woofer
Finally, the ".1" is a low-frequency effect ("LFE") channel which goes to a sub-woofer, typically located in front of you, but it would probably be most optimal to place it directly under your chair. This is the speaker that makes the room shake when there's a particularly loud crash on the sound track -- very popular in action movies.
Now of course, you can have simpler versions: "quadraphonic" or "4.0 Surround" sound, for example, eliminates the "Center" and "LFE" channels, and was popular for awhile in the 1970s. There are also more complex surround systems, which mainly add additional "Surround" speakers, giving us 7.1 ("Left Front", "Center", "Right Front", "Left", "Right", "Left Back", "Right Back", and "LFE"), and also 9.1.
Going from Binaural to Surround and back?
Clearly, since your brain can derive a 3D audio experience from either surround sound or binaural stereo sound through headphones, there must be some computational way to go from one to the other. And indeed this is true. However, it is complicated.
Looking into this is one of those "rabbit hole" moments when I discover that there's a whole field of endeavor that I didn't even know existed until now. Some interesting keywords (and Wikipedia links), may give you some idea: 3D audio Effect, Head-Related Transfer Functions, Binaural Recording, Psychoacoustics, sound Localization, and of course, surround sound.
Looking into this is one of those "rabbit hole" moments when I discover that there's a whole field of endeavor that I didn't even know existed until now
Some work was done with this for the Blender Foundation's "Yo Frankie!" video game project, by Barcelona Media, which resulted in a paper (PDF 1.4MB) and a slide presentation (PDF 6MB) on the technique, which used the CLAM audio processing library in combination with Ardour and Blender to create simulations of 3D sound effects. Someday I may attempt to use this technique and document it here -- but not today.
File format support: Ogg Vorbis, WAV, and FLAC
There are basically four file formats that I work with regularly for processing sound: MP3, WAV, Ogg Vorbis, and FLAC. MP3 has a number of problems, including patent restrictions which make it a poor choice for us to use for our own work, but of course, a great deal of music -- even the free-licensed music we are mostly relying on -- is distributed this way by default.
There are basically four file formats that I work with regularly for processing sound: MP3, WAV, Ogg Vorbis, and FLAC
So, of course, I have to support it on input, and so I'll address this first: MP3 does not support 5.1 surround sound. Nor any kind of multi-channel sound except "Stereo". There may be some variations out there that contradict this, but they do not appear to be part of the MP3 standard. Of course, you can encode binaural recordings in any stereo format, and some of the binaural recordings as source material are in MP3 format.
The best free format for lossy compressed audio is of course, Ogg Vorbis. Vorbis does support multichannel sound (for many channels, some sources say "unlimited", others say 256 -- but in any case, it's more than enough).
Note that there is a distinction between an Ogg container file containing more than one Vorbis stream and a single Vorbis stream with multiple interlaced channels of audio! It's probably best to think of a stream with separate Vorbis audio streams as a set of alternative audio tracks (and indeed this is how VideoLAN Client (VLC) handles such a stream. Whereas, an Ogg with a single Vorbis stream can hold many channels -- which are intended to be played back simultaneously, to different speakers. The most common case is "Stereo", where the first channels goes to the "Left" speaker and the second goes to the "Right" speaker. This gets more complex, and less standard with 5.1 surround sound, but the principle is the same.
Note that there is a distinction between an Ogg container file containing more than one Vorbis stream and a single Vorbis stream with multiple interlaced channels of audio!
However, 5.1 surround sound tends to be a high-fidelity need, which is somewhat in conflict with the bandwidth-conscious "good enough" ethos that applies to lossy compression formats.
So most of the time, when I'm trying to mix high-fidelity surround sound tracks, I'll be working with one of the two lossless formats that are available: uncompressed WAV format, and losslessly compressed FLAC format. FLAC (which stands for "Free Lossless Audio Compression") may not be very familiar to you if you're not an audiophile, but it has become a popular format for internet-sharing of lossless audio files. FLAC files tend to be much larger than either Ogg Vorbis or MP3 files, but also quite a bit smaller than uncompressed WAV files, which are, of course, very bulky.
As I mentioned briefly above, there is a wide-spread convention on the correct order for stereo tracks -- the left channel is first, followed by the right. Life is not so easy with surround sound. Standard orders have been slow to emerge, and inconsistencies remain. FLAC follows the same convention as WAV files, but Vorbis uses a different order. I had to do a lot of digging to find this, so I want to end today's column with a reference table the standard 5.1 surround sound channel assignments in these formats:
This work may be distributed under the terms of the Creative Commons Attribution-ShareAlike License, version 3.0, with attribution to "Terry Hancock, first published in Free Software Magazine". Illustrations and modifications to illustrations are under the same license and attribution, except as noted in their captions (all images in this article are CC By-SA 3.0 compatible).