Introduction for Audio 5.1
Audio five-point one is a common surround sound layouted in home theater. It has totally six channels: five speakers and a subwoofer and this is where the term five-point one comes from.
Channel Layout
Since we have more than one channel, we need to define an explicit channel ordering to know which channel the sent/received data is matched to.
Channel layout specifies the order of input/output channel data in audio buffer. For example, if the layout is stereo, then we have two channel data. The first data is for left channel, the second one is right channel.
The arrangement for the channels is different from format to format. The most common formats are defined in Wave Format Extensible(or WaveEx) or by Society of Motion Picture & Television Engineers(or SMPTE). They defines what channels are provided and the ordering of them in different layouts.
In this post, we follow the SMPTE format. An advantage of taking this standard is that it specifies the up/downmixing behaviours between different layouts. (The SMPTE’s standard for multichannel can be founded in ITU-R BS.775 and ITU-R BS.2159-7.)
Channels
Before knowing how the channels are sorted, we should know what channel is provided.
Code | Channel Name |
---|---|
M | Mono |
L | Left (Front Left) |
R | Right (Front Right) |
C | Center (Front Center) |
LS | Left Surround (Side Left) |
RS | Right Surround (Side Right) |
RLS | Rear Left Surround (Back Left) |
RC | Rear Center (Back Center) |
RRS | Rear Right Surround (Back Right) |
LFE | Low Frequency Effects |
Layouts
The layout in SMPTE’s format is as follows. Each layout is defined by a particular permutation of the above channels.
(The sample code to set channel layout on OSX is here.)
Layout | channel order | |||||||
---|---|---|---|---|---|---|---|---|
DUAL-MONO | L | R | ||||||
DUAL-MONO-LFE | L | R | LFE | |||||
MONO | M | |||||||
MONO-LFE | M | LFE | ||||||
STEREO | L | R | ||||||
STEREO-LFE | L | R | LFE | |||||
3F | L | R | C | |||||
3F-LFE | L | R | C | LFE | ||||
2F1 | L | R | RC | |||||
2F1-LFE | L | R | LFE | RC | ||||
3F1 | L | R | C | RC | ||||
3F1-LFE | L | R | C | LFE | RC | |||
2F2 | L | R | LS | RS | ||||
2F2-LFE | L | R | LFE | LS | RS | |||
3F2 | L | R | C | LS | RS | |||
3F2-LFE | L | R | C | LFE | LS | RS | ||
3F3R-LFE | L | R | C | LFE | RC | LS | RS | |
3F4-LFE | L | R | C | LFE | RLS | RRS | LS | RS |
Mixing
We already know that the audio layout can be configured into different types based on the number of the channels. The question is: what should we do when the input layout from the audio source doesn’t match the user’s output layout?
If the two channel layouts are equal, then they must have same numbers of channels and same channel order. Conversely, if two audio settings have different numbers of channels (e.g., {L, R} and {M, LFE}), or they have same numbers of channels but different orders (e.g., {L, R} and {R, L}), then they must have different channel layouts.
When the input layout is different from the output layout, we need to convert the audio input data to fit the audio output’s configuration. We call it mixing.
Here is the audio-mixer that can mix audio data from any input channel layout to any output channel layout. It’s implemented in Rust and used in Firefox.
Mixing matrix
Although it may have different definitions to convert the audio data from input into output, they can be summarized into the following equations. The above figure illustrates their relationships, and the value of \(m_{ij}\) varies from definition to definition.
\[\begin{align} L_{out} &= m_{11} \cdot L_{in} + m_{12} \cdot R_{in} + m_{13} \cdot C_{in} + m_{14} \cdot LFE_{in} + m_{15} \cdot LS_{in} + m_{16} \cdot RS_{in} \\ R_{out} &= m_{21} \cdot L_{in} + m_{22} \cdot R_{in} + m_{23} \cdot C_{in} + m_{24} \cdot LFE_{in} + m_{25} \cdot LS_{in} + m_{26} \cdot RS_{in} \\ C_{out} &= m_{31} \cdot L_{in} + m_{32} \cdot R_{in} + m_{33} \cdot C_{in} + m_{34} \cdot LFE_{in} + m_{35} \cdot LS_{in} + m_{36} \cdot RS_{in} \\ LFE_{out} &= m_{41} \cdot L_{in} + m_{42} \cdot R_{in} + m_{43} \cdot C_{in} + m_{44} \cdot LFE_{in} + m_{45} \cdot LS_{in} + m_{46} \cdot RS_{in} \\ LS_{out} &= m_{51} \cdot L_{in} + m_{52} \cdot R_{in} + m_{53} \cdot C_{in} + m_{54} \cdot LFE_{in} + m_{55} \cdot LS_{in} + m_{56} \cdot RS_{in} \\ RS_{out} &= m_{61} \cdot L_{in} + m_{62} \cdot R_{in} + m_{63} \cdot C_{in} + m_{64} \cdot LFE_{in} + m_{65} \cdot LS_{in} + m_{66} \cdot RS_{in} \end{align}\]To simplify them, we can rewite these equations into a matrix form:
\[\begin{align} \vec{Audio_{out}} = \begin{bmatrix} L_{out} \\ R_{out} \\ C_{out} \\ LFE_{out} \\ LS_{out} \\ RS_{out} \end{bmatrix} &= \begin{bmatrix} m_{11} & m_{12} & m_{13} & m_{14} & m_{15} & m_{16} \\ m_{21} & m_{22} & m_{23} & m_{24} & m_{25} & m_{26} \\ m_{31} & m_{32} & m_{33} & m_{34} & m_{35} & m_{36} \\ m_{41} & m_{42} & m_{43} & m_{44} & m_{45} & m_{46} \\ m_{51} & m_{52} & m_{53} & m_{54} & m_{55} & m_{56} \\ m_{61} & m_{62} & m_{63} & m_{64} & m_{65} & m_{66} \\ \end{bmatrix} \cdot \begin{bmatrix} L_{in} \\ R_{in} \\ C_{in} \\ LFE_{in} \\ LS_{in} \\ RS_{in} \end{bmatrix} \\ &= \vec{Matrix_{mixing}} \cdot \vec{Audio_{in}} \end{align}\]Downmixing
When numbers of input channels > numbers of output channels, we call it downmixing. (In this case, the input channel layout is definitiely different from the output one.) The most common case for downmixing is to downmix different audio layouts into stereo. The audio sources on the internet have various layouts while most users only have two speakers.
(The downward mixing mechanism of SMPTE for audio 5.1 is defined in Table 2 of ITU-R BS.775-3.)
Downmix audio 5.1 to stereo(stereophonic sound)
\[\begin{align} L_{out} &= L_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} + \frac{1}{\sqrt{2}} \cdot LS_{in} \\ R_{out} &= R_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} + \frac{1}{\sqrt{2}} \cdot RS_{in} + \end{align}\]Downmix audio 5.1 to quad(quadraphonic sound/4.0 surround sound)
\[\begin{align} L_{out} &= L_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} \\ R_{out} &= R_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} \\ LS_{out} &= LS_{in} \\ RS_{out} &= RS_{in} \end{align}\]Upmixing
When numbers of input channels < numbers of output channels, we call it upmixing. (In this case, the input channel layout is definitiely different from the output one.)
The most common case for this is to upmix stereo data(2 channels) into 3F2-LFE/audio 5.1(6 channels). There are several papers discussing how to achieve that.
Other case
The other case happens when numbers of input channels = numbers of output channels, but their channel layouts are different.
The conversion is easy as converting from STEREO-LFE: {L, R, LFE} to 3F: {L, R, C} (simply passing data):
\[\begin{align} L_{out} &= L_{in} \\ R_{out} &= R_{in} \\ C_{out} &= 0 \end{align}\];it’s also as complicated as converting from 3F1: {L, R, C, RC} to 2F2(or quad): {L, R, LS, RS}:
\(\begin{align} L_{out} &= L_{in} + p \cdot C_{in} \\ R_{out} &= R_{in} + p \cdot C_{in} \\ LS_{out} &= q \cdot RC_{in} \\ RS_{out} &= q \cdot RC_{in} \end{align}\) , where \(p, q\) are specific values.
Implementation
Many open source cross-platform audio libraries are good refereces to learn how to implement multi-channel on different platforms, such as
My experience for developling multi-channel is limited on cubeb. The development project page is hosted here. From my experience, most documents for audio development are vague and sometimes you even cannot find the official manuals about how to use the APIs(especially on OSX). The best way to learn that is to read the source code on github. Read the how other people use the APIs or the APIs’ implementation directly if it’s possible.
It’s time to rock on the code!