Beamforming Based on the Cross Spectral Matrix

How does beamforming with the CSM work?

The CSM-beamformer is a frequency domain algorithm. The fundamental idea is to separate the microphone collected data (time and space correlation between signals) from the focus information (time delays and phase shifts between microphones and the points of the scanning plane).

First, the sampling of signals captured by the microphones are split into blocks with a $2^n$ length ($L$). Afterwards the blocks are windowed with 50% overlap ($O$), depicted in Figure 1 for signal $p_m (t)$ of microphone $m$.

Their Fourier transforms are then averaged and used to build the CSM. The geometrical information required for focusing or steering the Acoustic Camera on all scanned points of the map is obtained using steering vectors. The delays of arrival $Δt_j$ of the signals are determined from the known distances between the scanning plane and microphones, as well as the speed of sound $c$.

If $x ⃗_t$ is a potential noise source, $M$ the number of microphones and $ω$ the angular frequency, the steering vector can be written as

$$g(x ⃗_t,\omega_k) = \frac{1}{M}\begin{pmatrix}e^{-i\omega_k\Delta t_0}\\ \vdots\\ e^{-i\omega_k\Delta t_{M-1}}\end{pmatrix}.$$

Using this geometric information, it is possible to steer the Acoustic Camera towards potential noise sources. Finally, the beamform or acoustic map $b$ can be computed as

$$b(x ⃗_t,\omega_k) = g^{\dagger}(x ⃗_t,\omega_k)\cdot CSM(\omega_k)\cdot g(x ⃗_t,\omega_k),$$

in which $†$ denotes the conjugate transpose.