## Introduction for Audio 5.1

Audio five-point one is a common surround sound layouted in home theater. It has totally six channels: five speakers and a subwoofer and this is where the term five-point one comes from.

## Channel Layout

Since we have more than one channel, we need to define an explicit channel ordering to know which channel the sent/received data is matched to.

Channel layout specifies the order of input/output channel data in audio buffer. For example, if the layout is stereo, then we have two channel data. The first data is for left channel, the second one is right channel.

The arrangement for the channels is different from format to format. The most common formats are defined in Wave Format Extensible(or WaveEx) or by Society of Motion Picture & Television Engineers(or SMPTE). They defines what channels are provided and the ordering of them in different layouts.

In this post, we follow the SMPTE format. An advantage of taking this standard is that it specifies the up/downmixing behaviours between different layouts. (The SMPTE’s standard for multichannel can be founded in ITU-R BS.775 and ITU-R BS.2159-7.)

### Channels

Before knowing how the channels are sorted, we should know what channel is provided.

Code Channel Name
M Mono
L Left (Front Left)
R Right (Front Right)
C Center (Front Center)
LS Left Surround (Side Left)
RS Right Surround (Side Right)
RLS Rear Left Surround (Back Left)
RC Rear Center (Back Center)
RRS Rear Right Surround (Back Right)
LFE Low Frequency Effects

### Layouts

The layout in SMPTE’s format is as follows. Each layout is defined by a particular permutation of the above channels.

(The sample code to set channel layout on OSX is here.)

Layout channel order
DUAL-MONO L R
DUAL-MONO-LFE L R LFE
MONO M
MONO-LFE M LFE
STEREO L R
STEREO-LFE L R LFE
3F L R C
3F-LFE L R C LFE
2F1 L R RC
2F1-LFE L R LFE RC
3F1 L R C RC
3F1-LFE L R C LFE RC
2F2 L R LS RS
2F2-LFE L R LFE LS RS
3F2 L R C LS RS
3F2-LFE L R C LFE LS RS
3F3R-LFE L R C LFE RC LS RS
3F4-LFE L R C LFE RLS RRS LS RS

## Mixing

We already know that the audio layout can be configured into different types based on the number of the channels. The question is: what should we do when the input layout from the audio source doesn’t match the user’s output layout?

If the two channel layouts are equal, then they must have same numbers of channels and same channel order. Conversely, if two audio settings have different numbers of channels (e.g., {L, R} and {M, LFE}), or they have same numbers of channels but different orders (e.g., {L, R} and {R, L}), then they must have different channel layouts.

When the input layout is different from the output layout, we need to convert the audio input data to fit the audio output’s configuration. We call it mixing.

Here is the audio-mixer that can mix audio data from any input channel layout to any output channel layout. It’s implemented in Rust and used in Firefox.

### Mixing matrix

Although it may have different definitions to convert the audio data from input into output, they can be summarized into the following equations. The above figure illustrates their relationships, and the value of $m_{ij}$ varies from definition to definition.

\begin{align} L_{out} &= m_{11} \cdot L_{in} + m_{12} \cdot R_{in} + m_{13} \cdot C_{in} + m_{14} \cdot LFE_{in} + m_{15} \cdot LS_{in} + m_{16} \cdot RS_{in} \\ R_{out} &= m_{21} \cdot L_{in} + m_{22} \cdot R_{in} + m_{23} \cdot C_{in} + m_{24} \cdot LFE_{in} + m_{25} \cdot LS_{in} + m_{26} \cdot RS_{in} \\ C_{out} &= m_{31} \cdot L_{in} + m_{32} \cdot R_{in} + m_{33} \cdot C_{in} + m_{34} \cdot LFE_{in} + m_{35} \cdot LS_{in} + m_{36} \cdot RS_{in} \\ LFE_{out} &= m_{41} \cdot L_{in} + m_{42} \cdot R_{in} + m_{43} \cdot C_{in} + m_{44} \cdot LFE_{in} + m_{45} \cdot LS_{in} + m_{46} \cdot RS_{in} \\ LS_{out} &= m_{51} \cdot L_{in} + m_{52} \cdot R_{in} + m_{53} \cdot C_{in} + m_{54} \cdot LFE_{in} + m_{55} \cdot LS_{in} + m_{56} \cdot RS_{in} \\ RS_{out} &= m_{61} \cdot L_{in} + m_{62} \cdot R_{in} + m_{63} \cdot C_{in} + m_{64} \cdot LFE_{in} + m_{65} \cdot LS_{in} + m_{66} \cdot RS_{in} \end{align}

To simplify them, we can rewite these equations into a matrix form:

\begin{align} \vec{Audio_{out}} = \begin{bmatrix} L_{out} \\ R_{out} \\ C_{out} \\ LFE_{out} \\ LS_{out} \\ RS_{out} \end{bmatrix} &= \begin{bmatrix} m_{11} & m_{12} & m_{13} & m_{14} & m_{15} & m_{16} \\ m_{21} & m_{22} & m_{23} & m_{24} & m_{25} & m_{26} \\ m_{31} & m_{32} & m_{33} & m_{34} & m_{35} & m_{36} \\ m_{41} & m_{42} & m_{43} & m_{44} & m_{45} & m_{46} \\ m_{51} & m_{52} & m_{53} & m_{54} & m_{55} & m_{56} \\ m_{61} & m_{62} & m_{63} & m_{64} & m_{65} & m_{66} \\ \end{bmatrix} \cdot \begin{bmatrix} L_{in} \\ R_{in} \\ C_{in} \\ LFE_{in} \\ LS_{in} \\ RS_{in} \end{bmatrix} \\ &= \vec{Matrix_{mixing}} \cdot \vec{Audio_{in}} \end{align}

### Downmixing

When numbers of input channels > numbers of output channels, we call it downmixing. (In this case, the input channel layout is definitiely different from the output one.) The most common case for downmixing is to downmix different audio layouts into stereo. The audio sources on the internet have various layouts while most users only have two speakers.

(The downward mixing mechanism of SMPTE for audio 5.1 is defined in Table 2 of ITU-R BS.775-3.)

##### Downmix audio 5.1 to stereo(stereophonic sound)

\begin{align} L_{out} &= L_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} + \frac{1}{\sqrt{2}} \cdot LS_{in} \\ R_{out} &= R_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} + \frac{1}{\sqrt{2}} \cdot RS_{in} + \end{align}

\begin{align} L_{out} &= L_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} \\ R_{out} &= R_{in} + \frac{1}{\sqrt{2}} \cdot C_{in} \\ LS_{out} &= LS_{in} \\ RS_{out} &= RS_{in} \end{align}

### Upmixing

When numbers of input channels < numbers of output channels, we call it upmixing. (In this case, the input channel layout is definitiely different from the output one.)

The most common case for this is to upmix stereo data(2 channels) into 3F2-LFE/audio 5.1(6 channels). There are several papers discussing how to achieve that.

### Other case

The other case happens when numbers of input channels = numbers of output channels, but their channel layouts are different.

The conversion is easy as converting from STEREO-LFE: {L, R, LFE} to 3F: {L, R, C} (simply passing data):

\begin{align} L_{out} &= L_{in} \\ R_{out} &= R_{in} \\ C_{out} &= 0 \end{align}

;it’s also as complicated as converting from 3F1: {L, R, C, RC} to 2F2(or quad): {L, R, LS, RS}:

\begin{align} L_{out} &= L_{in} + p \cdot C_{in} \\ R_{out} &= R_{in} + p \cdot C_{in} \\ LS_{out} &= q \cdot RC_{in} \\ RS_{out} &= q \cdot RC_{in} \end{align} , where $p, q$ are specific values.

## Implementation

Many open source cross-platform audio libraries are good refereces to learn how to implement multi-channel on different platforms, such as

My experience for developling multi-channel is limited on cubeb. The development project page is hosted here. From my experience, most documents for audio development are vague and sometimes you even cannot find the official manuals about how to use the APIs(especially on OSX). The best way to learn that is to read the source code on github. Read the how other people use the APIs or the APIs’ implementation directly if it’s possible.

It’s time to rock on the code!