Binaural LCMV Beamforming with Partial Noise Estimation
Nico G\"o{\ss}ling, Elior Hadad, Sharon Gannot, Simon Doclo

TL;DR
This paper introduces BLCMV-N, a novel binaural beamforming method that combines control over interfering source reduction with partial noise cue preservation, improving spatial impression and noise reduction trade-offs.
Contribution
The paper proposes BLCMV-N, integrating advantages of BLCMV and BMVDR-N, with theoretical analysis and experimental validation for enhanced binaural cue preservation and noise reduction.
Findings
BLCMV-N preserves binaural cues of interfering sources.
BLCMV-N balances noise reduction and cue preservation.
Experimental results confirm improved perceptual quality.
Abstract
Besides reducing undesired sources (interfering sources and background noise), another important objective of a binaural beamforming algorithm is to preserve the spatial impression of the acoustic scene, which can be achieved by preserving the binaural cues of all sound sources. While the binaural minimum variance distortionless response (BMVDR) beamformer provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the interfering sources and the background noise. Hence, several extensions have been proposed. First, the binaural linearly constrained minimum variance (BLCMV) beamformer uses additional constraints, enabling to control the reduction of the interfering sources while preserving their binaural cues. Second, the BMVDR with partial noise…
| BMVDR | BLCMV | BMVDR-N | BLCMV-N | |
| [dB] | 13.0 | 10.1 | 8.6 | 7.6 |
| [dB] | 12.9 | 9.2 | 8.6 | 7.0 |
| [dB] | -0.1 | 9.7 | 0.82 | 9.8 |
| [dB] | -4.3 | 8.7 | -2.4 | 8.9 |
| 0.86 | 0.64 | 0.10 | 0.19 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Binaural LCMV Beamforming with
Partial Noise Estimation
Nico Gößling, Elior Hadad, Sharon Gannot, and Simon Doclo, This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - Project ID 352015383 (SFB 1330 B2) and Project ID 390895286 (EXC 2177/1) and the Israeli Ministry of Science and Technology, #88962, 2019.N. Gößling and S. Doclo are with the Department of Medical Physics and Acoustics and the Cluster of Excellence Hearing4all, University of Oldenburg, 26111 Oldenburg, Germany (e-mail: [email protected]; [email protected]).E. Hadad and S. Gannot are with the Faculty of Engineering, Bar-Ilan University, Ramat-Gan, 5290002, Israel (e-mail: [email protected]; [email protected]).
Abstract
Besides reducing undesired sources, i.e., interfering sources and background noise, another important objective of a binaural beamforming algorithm is to preserve the spatial impression of the acoustic scene, which can be achieved by preserving the binaural cues of all sound sources. While the binaural minimum variance distortionless response (BMVDR) beamformer provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the interfering sources and the background noise. Hence, several extensions have been proposed. First, the binaural linearly constrained minimum variance (BLCMV) beamformer uses additional constraints, enabling to control the reduction of the interfering sources while preserving their binaural cues. Second, the BMVDR with partial noise estimation (BMVDR-N) mixes the output signals of the BMVDR with the noisy reference microphone signals, enabling to control the binaural cues of the background noise. Aiming at merging the advantages of both extensions, in this paper we propose the BLCMV with partial noise estimation (BLCMV-N). We show that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We provide a theoretical comparison between the BMVDR, the BLCMV, the BMVDR-N and the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. Experimental results using recorded signals as well as the results of a perceptual listening test show that the BLCMV-N is able to preserve the binaural cues of an interfering source (like the BLCMV), while enabling to trade off between noise reduction performance and binaural cue preservation of the background noise (like the BMVDR-N).
Index Terms:
Binaural cues, binaural noise reduction, MVDR beamformer, LCMV beamformer, hearing devices
I Introduction
Beamforming algorithms for head-mounted assistive hearing devices (e.g., hearing aids, earbuds and hearables) are crucial to improve speech quality and speech intelligibility in noisy acoustic environments. Assuming a binaural configuration where both devices exchange their microphone signals, the information captured by all microphones on both sides of the head can be exploited [1, 2, 3]. Besides reducing interfering sources (e.g., competing speakers) and background noise (e.g., diffuse babble noise), another important objective of a binaural beamforming algorithm is the preservation of the listener’s spatial impression of the acoustic scene. This can be achieved by preserving the binaural cues of all sound sources, i.e., the interaural level difference (ILD) and the interaural time difference (ITD) for coherent sources (desired source and interfering sources) and the interaural coherence (IC) for incoherent sound fields (background noise) [4]. Binaural cues play a major role for spatial perception, i.e., to localize sound sources and to determine the spatial width or diffuseness of a sound field [5], and are very important for speech intelligibility due to so-called binaural unmasking [6, 7].
Unlike monaural beamforming algorithms, binaural beamforming algorithms need to generate two output signals (i.e., one for each ear), hence typically processing all available microphone signals from both devices by two different spatial filters [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. A frequently used binaural beamforming algorithm is the binaural minimum variance distortionless response (BMVDR) beamformer, which aims at minimizing the power spectral density (PSD) of the noise component in the output signals while preserving the desired source component in the reference microphone signals on the left and the right device [2, 3, 11]. While the BMVDR provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the undesired sources (interfering sources and background noise). More specifically, after applying the BMVDR the binaural cues of the undesired sources are equal to the binaural cues of the desired source, such that all sources are perceived as coming from the same direction, which is obviously undesired. Hence, several extensions of the BMVDR have been proposed. On the one hand, the binaural linearly constrained minimum variance (BLCMV) beamformer uses additional interference reduction constraints, enabling to control the reduction of the interfering sources while preserving the binaural cues of the interfering sources in addition to the desired source by means of interference scaling parameters [12, 14, 20, 17]. However, due to the additional constraints there are less degrees of freedom available for noise reduction, such that the noise reduction performance for the BLCMV is lower than for the BMVDR. Furthermore, it is not possible to explicitly trade off between noise reduction performance and binaural cue preservation of the background noise. On the other hand, the BMVDR with partial noise estimation (BMVDR-N) aims for the noise component in the output signals to be equal to a scaled version of the noise component in the reference microphone signals while preserving the desired source component in the reference microphone signals [3, 10, 11, 16]. It has been shown that the output signals of the BMVDR-N can be interpreted as a mixture between the output signals of the BMVDR and the noisy reference microphone signals, i.e., the BMVDR-N provides a trade-off between noise reduction performance and binaural cue preservation of the background noise. While for (incoherent) background noise the BMVDR-N showed promising results [16, 21], the effect of partial noise estimation on a (coherent) interfering source strongly depends on the position of the interfering source relative to the desired source and is harder to control [11].
Aiming at merging the advantages of the BLCMV and the BMVDR-N, i.e., preserving the binaural cues of the interfering sources and controlling the reduction of the interfering sources as well as the binaural cues of the background noise, in this paper we propose the BLCMV with partial noise estimation (BLCMV-N). First, we derive two decompositions for the BLCMV-N which reveal differences and similarities between the BLCMV-N and the BLCMV. We show that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We then analytically derive the performance of the BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. We show that the output signal-to-noise ratio (SNR) of the BLCMV-N is smaller than or equal to the output SNR of the BLCMV and derive the optimal interference scaling parameter maximizing the output SNR of the BLCMV-N. The derived analytical expressions are first validated using measured anechoic acoustic transfer functions (ATFs). In addition, more realistic experiments are performed using recorded signals for a binaural hearing device in a reverberant cafeteria with one interfering source and multi-talker babble noise. Both the objective performance measures as well as the results of a perceptual listening test with 13 normal-hearing participants show that the proposed BLCMV-N is able to preserve the binaural cues and hence the spatial impression of the interfering source (like the BLCMV), while trading off between noise reduction performance and binaural cue preservation of the background noise (like the BMVDR-N).
The remainder of this paper is organized as follows. In Section II we introduce the considered binaural hearing device configuration and the used objective performance measures. In Section III we briefly review several binaural beamforming algorithms, namely the BMVDR, the BLCMV and the BMVDR-N. In Section IV we present the BLCMV-N and derive two decompositions. In Section V we provide a detailed theoretical analysis of the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. In Section VI we first validate the analytical expressions using anechoic ATFs, followed by simulations and a perceptual listening test using realistic recordings in a reverberant room.
II Hearing Device Configuration
In Section II-A the considered binaural hearing device configuration and the signal model are introduced. In Sections II-B and II-C the objective performance measures and the binaural cues are defined.
II-A Signal Model
Consider the binaural hearing device configuration depicted in Figure 1 with microphones on the left side and microphones on the right side, i.e., microphones in total. In this paper we consider an acoustic scenario with one desired source (target speaker) and one interfering source (competing speaker) in a noisy and reverberant environment, where the background noise is assumed to be incoherent (e.g., diffuse babble noise, sensor noise).
In the frequency-domain, the -th microphone signal can be decomposed as
[TABLE]
with the normalized (radian) frequency, the desired source component, the interfering source component and the noise component in the -th microphone signal. The undesired component is defined as the sum of the interfering source component and the noise component . For the sake of conciseness, we omit the variable in the remainder of the paper wherever possible. The -dimensional noisy input vector containing all microphone signals is defined as
[TABLE]
where denotes the transpose. Using (1), this vector can be written as
[TABLE]
where , , and are defined similarly as in (2).
For the considered acoustic scenario, the desired source component and the interfering source component can be written as
[TABLE]
where and denote the desired source signal and the interfering source signal, respectively, and and denote -dimensional ATF vectors, containing the ATFs between the microphones and the desired source and the interfering source, respectively. It should be noted that the ATFs include reverberation, microphone characteristics and the head-shadow effect.
Without loss of generality, the first microphone on each side is defined as the so-called reference microphone. To simplify the notation, the reference microphone signals and are denoted as and , i.e.,
[TABLE]
where and denote -dimensional selection vectors with all elements equal to 0 except one element equal to 1, i.e., and . Using (3), (4) and (5), the reference microphone signals can be written as
[TABLE]
The noisy input covariance matrix , the desired source covariance matrix , the interfering source covariance matrix and the noise covariance matrix are defined as
[TABLE]
with the expected value operator and the conjugate transpose. Assuming statistical independence between all signal components, can be written as
[TABLE]
with the undesired covariance matrix. Using (4), (8) and (9), the desired source covariance matrix and the interfering source covariance matrix can be written as rank-1 matrices, i.e.,
[TABLE]
with the PSD of the desired source and the PSD of the interfering source. The noise covariance matrix is assumed to be full-rank, i.e., invertible and positive definite.
The left and the right output signals and are obtained by filtering and summing all microphone signals using the -dimensional filter vectors and (cf. Figure 1), i.e.,
[TABLE]
II-B Objective Performance Measures
The PSD and the cross power spectral density (CPSD) of the desired source component in the left and the right reference microphone signal are given by
[TABLE]
Similarly, the output PSD of the desired source component in the left and the right output signal is given by
[TABLE]
The same definitions can be applied for the noisy input signal, the interfering source component and the noise component by substituting with , or .
The narrowband input SNR in the left and the right reference microphone signal is defined as the ratio of the input PSD of the desired source and noise components, i.e.,
[TABLE]
Similarly, the narrowband output SNR in the left and the right output signal is defined as the ratio of the output PSD of the desired source and noise components, i.e.,
[TABLE]
The SNR improvement (in dB) is defined as .
The narrowband input signal-to-interference ratio (SIR) in the left and the right reference microphone signal is defined as the ratio of the input PSD of the desired source and interfering source components, i.e.,
[TABLE]
Similarly, the narrowband output SIR in the left and the right output signal is defined as the ratio of the output PSD of the desired source and interfering source components, i.e.,
[TABLE]
The SIR improvement (in dB) is defined as .
II-C Binaural Cues
For coherent sources (desired source and interfering source) the main binaural cues used by the auditory system are the ILD and the ITD [4], which can be computed from the so-called interaural transfer function (ITF). Using (11), the input ITFs of the desired source and the interfering source are given by [11]
[TABLE]
Similarly, the output ITFs of the desired source and the interfering source are given by
[TABLE]
The ILD and the ITD can be calculated from the ITF as [11]
[TABLE]
with denoting the unwrapped phase.
For an incoherent sound field (background noise), ILD and ITD cues are not very descriptive, but the IC is known to play a major role for spatial perception (e.g., spatial width or diffuseness) [4]. The input IC of the noise component is defined as
[TABLE]
while the output IC of the noise component is defined as
[TABLE]
Because the IC is typically complex-valued, the magnitude-squared coherence (MSC) is often used. The input and the output MSC of the noise component are defined as
[TABLE]
An MSC of 1 corresponds to a coherent source perceived as a distinct point source, while smaller MSC values correspond to a broader or even diffuse sound field impression [4].
III Binaural Beamforming Algorithms
In this section we briefly review three state-of-the-art binaural beamforming algorithms, namely the BMVDR beamformer, the BLCMV beamformer and the BMVDR-N beamformer. We discuss the performance of these beamforming algorithms in terms of noise and interference reduction performance and binaural cue preservation. For the sake of conciseness, we only show expressions for the left hearing device, denoted by the subscript . It should be noted that all expressions can also be formulated for the right hearing device by changing the subscript to .
III-A BMVDR Beamformer
The BMVDR aims at minimizing the output PSD of the noise component while preserving the desired source component in the reference microphone signals [2, 3, 11]. The constrained optimization problem for the left filter vector is given by
[TABLE]
Using (4), (6) and (9), the solution of (28) is equal to [2, 22, 23]
[TABLE]
with
[TABLE]
It should be noted that the BMVDR can also be defined using the undesired covariance matrix instead of the noise covariance matrix . However, since is considerably more difficult to estimate or model in practice than , in this paper we only consider the BMVDR using in (29).
By substituting (29) in (18) and (20), it has been shown in [3, 11] that the output SNR and the output SIR of the BMVDR are equal to
[TABLE]
with defined in (30) and
[TABLE]
Although the BMVDR yields the largest output SNR among all distortionless binaural beamforming algorithms, the output SIR depends on the relative position of the interfering source to the desired source, cf. (33).
As shown in [3, 11, 13], the BMVDR preserves the binaural cues of the desired source, i.e.,
[TABLE]
but distorts the binaural cues of the undesired sources, i.e., for the interfering source
[TABLE]
and for the background noise
[TABLE]
Hence, at the output of the BMVDR the interfering source and the (incoherent) background noise are perceived as coming from the direction of the desired source, which is obviously undesired in terms of spatial awareness.
III-B BLCMV Beamformer
In addition to preserving the desired source component in the reference microphone signals, the BLCMV preserves a scaled version of the interfering source component in the reference microphone signals while minimizing the output PSD of the noise component [12, 14]. The constrained optimization problem for the left filter vector is given by [14]
[TABLE]
with the (real-valued) interference scaling parameter. Using (4), (6) and (9), the solution of (37) is equal to [14]
[TABLE]
with the constraint matrix and the left response vector defined as
[TABLE]
By substituting (38) in (18), it has been shown in [14] that the output SNR of the BLCMV is equal to
[TABLE]
with
[TABLE]
where denotes the real part of a complex number. The output SNR of the BLCMV in (40) is smaller than or equal to the output SNR of the BMVDR in (31), since less degrees of freedom are available for noise reduction. In addition, the output SIR of the BLCMV is equal to [14]
[TABLE]
which can hence be directly controlled by the interference scaling parameter .
As shown in [14], the BLCMV preserves the binaural cues of both the desired source and the interfering source, i.e.,
[TABLE]
and the output MSC of the noise component is equal to
[TABLE]
Because in (41) is a rank-2 matrix, it has been shown in [14] that the output MSC of the noise component is smaller than 1 but is not equal to the input MSC of the noise component. Furthermore, it should be noted that the output MSC of the noise component depends on the relative position of the interfering source to the desired source, cf. (41) and (42), such that it is not straightforward to control the binaural cues of the background noise.
III-C BMVDR-N beamformer
In addition to preserving the desired source component in the reference microphone signals, the BMVDR with partial noise estimation (BMVDR-N) aims at preserving a scaled version of the noise component in the reference microphone signals [3, 11, 10]. The constrained optimization problem for the left filter vector is given by
[TABLE]
with the (real-valued) mixing parameter. It has been shown in [11] that the solution of (47) is equal to
[TABLE]
with defined in (29). Hence, the output signals of the BMVDR-N can be interpreted as a mixture between the noisy reference microphone signals (scaled with ) and the output signals of the BMVDR (scaled with ). For , the BMVDR-N is equal to the BMVDR, whereas for , no beamforming is applied.
Since the output signals of the BMVDR are mixed with the noisy reference microphone signals, the output SNR of the BMVDR-N is always smaller than or equal to the output SNR of the BMVDR [11], i.e.,
[TABLE]
and decreases with increasing . By substituting (48) in (20), it can be shown that the output SIR of the BMVDR-N is equal to
[TABLE]
with
[TABLE]
As shown in [11, 16], the BMVDR-N preserves the binaural cues of the desired source, i.e.,
[TABLE]
By substituting (48) in (24) and (26), it has been shown in [16] and [20] that the output ITF of the interfering source is equal to
[TABLE]
and the output MSC of the noise component is equal to
[TABLE]
It can be seen from (52) and (53) that only for the binaural cues of the undesired sources (interfering source and background noise) are preserved, whereas for the binaural cues of the undesired sources are equal to the binaural cues of the desired source (as for the BMVDR). The mixing parameter hence allows to trade off between noise reduction performance and binaural cue preservation of the background noise, or in other words control the binaural cues of the background noise. Furthermore, it should be noted that the interference reduction performance in (50) and the output ITF of the interfering source in (52) do not only depend on the mixing parameter but also on the relative position of the interfering source to the desired source, such that it is not straightforward to control both.
IV BLCMV with partial noise estimation
Aiming at merging the advantages of the BLCMV and the BMVDR-N, i.e., preserving the binaural cues of the interfering source and controlling the binaural cues of the background noise, in Section IV-A we present the BLCMV beamformer with partial noise estimation (BLCMV-N). Similarly as for the BLCMV in [14], in Sections IV-B and IV-C we derive two decompositions for the BLCMV-N which reveal differences and similarities between the BLCMV-N and the BLCMV.
IV-A BLCMV-N Beamformer
Compared to the BMVDR in (28), the BLCMV-N uses an additional constraint to preserve a scaled version of the interfering source component in the reference microphone signals, like the BLCMV in (37), and aims at preserving a scaled version of the noise component in the reference microphone signals, like the BMVDR-N in (47). The constrained optimization problem for the left filter vector is given by
[TABLE]
The solution of (54) is equal to (see Appendix A)
[TABLE]
with defined in (39) and the adjusted interference scaling parameter equal to
[TABLE]
Hence, the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals (scaled with ) and the output signals of a BLCMV (scaled with ) using the adjusted interference scaling parameter in (56) instead of the interference scaling parameter . For , the BLCMV-N is equal to the BLCMV in (38) with , whereas for , it should be realized that only if no beamforming is applied. Since mixing with the reference microphone signals not only affects the noise component but also the interfering source component, the adjusted interference scaling parameter depends on both the interference scaling parameter as well as the mixing parameter due to the interference reduction constraint in (54). Figure 2 depicts as a function of for different values of . It can be seen that
[TABLE]
As will be shown in more detail in the following sections, using the parameters and it is possible to control the noise reduction performance, the interference reduction performance and the binaural cues of the background noise for the BLCMV-N.
IV-B Decomposition into two BLCMVs
In [14] it has been shown that the BLCMV in (38) can be decomposed as the sum of two sub-BLCMVs, i.e.,
[TABLE]
with
[TABLE]
and the respective response vectors
[TABLE]
The sub-BLCMV in (59) preserves the desired source component in the reference microphone signals and steers a null towards the interfering source, whereas the sub-BLCMV in (60) preserves the interfering source component in the reference microphone signals and steers a null towards the desired source. Using (55), it can be easily seen that the proposed BLCMV-N can be decomposed as
[TABLE]
Hence, the BLCMV-N can be interpreted as a mixture of the reference microphone signals (scaled with ), a BLCMV that preserves the desired source and rejects the interfering source (scaled with ) and a BLCMV that preserves the interfering source and rejects the desired source (scaled with ). Since the scaling of the sub-BLCMV controls the desired source component without affecting the interfering source component and the scaling of the sub-BLCMV controls the interfering source component without affecting the desired source component [14], it can be directly observed from the scaling factors in (62) that the desired source component is not distorted and the interfering source component is scaled with .
IV-C Decomposition using Binauralization Postfilters
In [14] it has also been shown that the sub-BLCMV in (59) for the left hearing device and the sub-BLCMV for the right hearing device (defined similarly as ) can be written using a common spatial filter and two binauralization postfilters as
[TABLE]
with the common desired BLCMV (D-BLCMV) given by
[TABLE]
and the ATFs and between the desired source and the reference microphones used as binauralization postfilters. Similarly, the sub-BLCMV in (60) and the sub-BLCMV (defined similarly as ) can be written as
[TABLE]
with the common interference BLCMV (I-BLCMV) given by
[TABLE]
and the ATFs and between the interfering source and the reference microphones used as binauralization postfilters.
Using (63) and (65) in (62), the BLCMV-N can be decomposed as
[TABLE]
Figure 3 depicts this decomposition of the BLCMV-N using common spatial filters and binauralization postfilters. The output signals of the BLCMV-N can hence be interpreted as a mixture between the reference microphone signals (scaled with ), the binauralized output signals of the D-BLCMV (scaled with ) and the binauralized output signals of the I-BLCMV (scaled with ).
Due to the constraints in (54), the BLCMV-N perfectly preserves the desired source component and scales the interfering source component with . Using (67) and (68), the noise component in the output signals of the BLCMV-N are equal to
[TABLE]
with and the noise component in the output signal of the D-BLCMV and the I-BLCMV, respectively. The noise component in the output signals of the BLCMV-N can hence be interpreted as a mixture between the noise component in the reference microphone signals (scaled with ), a coherent residual noise source () coming from the direction of the desired source (scaled with ) and a coherent residual noise source () coming from the direction of the interfering source (scaled with ).
V Performance of the BLCMV-N
In this section we provide a performance analysis of the proposed BLCMV-N. In Section V-A we derive the output PSDs of the signal components. In Sections V-B and V-C we analyze the noise and interference reduction performance and the binaural cue preservation performance. Finally, in Section V-D we discuss the setting of the mixing parameter and the interference scaling parameter .
V-A Output Power Spectral Densities
Due to the constraints in (54), the output PSD of the desired and interfering source components in the left output signal of the BLCMV-N are equal to, cf. (13),
[TABLE]
Furthermore, the output PSD of the noise component in the left output signal of the BLCMV-N is equal to (see Appendix B)
[TABLE]
with
[TABLE]
with defined in (30), defined in (33), and and defined in (42). It can be seen that the output PSD of the noise component for the BLCMV-N is a quadratic function in both the mixing parameter and the interference scaling parameter . By comparing (V-A) to (41), it can be observed that
[TABLE]
where denotes the expression for the BLCMV in (41) with , corresponding to no suppression of the interfering source. Please note that for , , and for and , . By using (75) in (73), it follows that
[TABLE]
V-B Noise and Interference Reduction Performance
By substituting (71) and (73) in (18), the left output SNR of the BLCMV-N is equal to
[TABLE]
which depends on both the mixing parameter and the interference scaling parameter . Using (76) and realizing that the output PSD of the noise component in the left output signal of the BLCMV (for any value of ) is smaller than or equal to the PSD of the noise component in the left reference microphone signal, the output SNR of the BLCMV-N in (77) is smaller than or equal to the output SNR of the BLCMV in (40), i.e.,
[TABLE]
By substituting (71) and (72) in (20), the left output SIR of the BLCMV-N is equal to
[TABLE]
which is equal to the left output SIR of the BLCMV in (43) and solely controlled by the interference scaling parameter . For , the left output SNR of the BLCMV-N is equal to the left output SNR of the BLCMV in (40), while for and , the left output SNR of the BLCMV-N is equal to the left input SNR because no beamforming is applied.
V-C Binaural Cue Preservation
Similarly as for the BLCMV, due to the constraints in (54) the BLCMV-N preserves the binaural cues of both the desired source and the interfering source, i.e.,
[TABLE]
Using (26), the output IC of the noise component for the BLCMV-N is equal to (see Appendix B for derivation of components)
[TABLE]
with defined in (V-A). Since depends on both the mixing parameter and the interference scaling parameter , also the output IC of the noise component in (V-C) depends on both parameters. Using (27), the output MSC of the noise component for the BLCMV-N is equal to
[TABLE]
Since for the BLCMV-N is equal to the BLCMV, the output MSC of the noise component is smaller than 1, see Section III-B. It should however be realized that in contrast to the BMVDR-N discussed in Section III-C, for the BLCMV-N does not always preserve the MSC of the noise component. Only for and the binaural cues of all signal components are preserved because no beamforming is applied.
V-D Parameter Settings
Maximizing the left output SNR in (77) corresponds to minimizing the denominator, i.e., using (75),
[TABLE]
Setting the derivative of (84) with respect to the mixing parameter equal to zero, yields
[TABLE]
as the optimal mixing parameter in terms of left (and right) output SNR. The derivative of (84) with respect to the interference scaling parameter is equal to, using (41),
[TABLE]
Setting (86) to zero and solving for yields the optimal interference scaling parameter in terms of left output SNR, i.e.,
[TABLE]
with
[TABLE]
As can be seen from (79), the output SIR is not affected by the mixing parameter but is solely determined by the interference scaling parameter .
VI Simulations
In Section VI-A we first validate the expressions derived in the previous sections using measured anechoic ATFs. In Section VI-B we then experimentally compare the performance of the proposed BLCMV-N with the BMVDR, BLCMV and BMVDR-N using recorded signals in a reverberant environment with a competing speaker and multi-talker babble noise. Finally, in Section VI-C we compare the spatial impression of the considered binaural beamforming algorithms using a perceptual listening test.
VI-A Validation Using Measured Anechoic ATFs
To validate the derived expressions for the considered algorithms we used measured anechoic ATFs of two behind-the-ear hearing aids mounted on a head-and-torso-simulator (HATS) [24]. Each hearing aid has two microphones () with an inter-microphone distance of about . We chose the front microphone on each hearing aid as reference microphone. The ATFs were calculated from anechoic impulse responses using a 512-point FFT at a sampling rate of .
The desired source was placed at (in front) and the interfering source was placed at (to the left), both at a distance of from the HATS. The desired source covariance matrix and the interfering source covariance matrix were constructed using the ATF vector of the desired source and the ATF vector of the interfering source according to (11), where the PSD of the desired source and the PSD of the interfering source were both set to 1. As background noise we considered a combination of spatially white and cylindrically isotropic noise, i.e., the noise covariance matrix was constructed as
[TABLE]
with the PSD of the spatially white noise, the -dimensional identity matrix, the PSD of the cylindrically isotropic noise and its spatial coherence matrix. The -th element of the spatial coherence matrix was calculated using all available anechoic ATFs as
[TABLE]
with the anechoic ATF at angle and the total number of angles in the database ( for [24]). The PSD of the spatially white noise was set to , while the PSD of the cylindrically isotropic noise was set to 1.
VI-A1 Noise and Interference Reduction Performance
Using (17) and (18), Figure 4 depicts the left SNR improvement at for the BLCMV-N for different values of the mixing parameter and the interference scaling parameter and the BMVDR-N for different values of the mixing parameter . As expected, the BMVDR (i.e., BMVDR-N for ) yields the largest SNR improvement (cf. (78)). Since the BMVDR-N mixes the output signals of the BMVDR with the noisy reference microphone signals, it can be observed that increasing the mixing parameter reduces the SNR improvement of the BMVDR-N compared to the BMVDR (). For the BLCMV-N, both and affect the SNR improvement, which is in line with (77). Similarly to the BMVDR-N, the BLCMV-N mixes the output signals of a BLCMV with the noisy reference microphone signals. Hence, it can be observed that for any value of the interference scaling parameter , increasing the mixing parameter reduces the SNR improvement of the BLCMV-N compared to the BLCMV (), which is in line with (78). Since less degrees of freedom are available for noise reduction, the BLCMV () yields a smaller SNR improvement compared to the BMVDR (), as discussed in Section III-B. Using (87), the interference scaling parameter maximizing the output SNR was equal to for the considered acoustic scenario. As expected, it can be observed that using leads to the largest SNR improvement of all considered values of . For large values of the mixing parameter , the BLCMV-N yields a larger SNR improvement than the BMVDR-N. It should be noted that the exact behaviour depends on the interference scaling parameter and the relative position of the interfering source to the desired source.
Using (19) and (20), Figure 5 depicts the left SIR improvement at for the BLCMV-N for different values of the mixing parameter and the interference scaling parameter and the BMVDR-N for different values of the mixing parameter . As expected from (43) and (79), both the BLCMV-N and the BLCMV () yield the same SIR improvement, which is solely controlled by the interference scaling parameter . Hence, increasing the interference scaling parameter reduces the SIR improvement for both the BLCMV-N and the BLCMV. For the BMVDR-N it can be observed that increasing the mixing parameter reduces the SIR improvement. It should be noted that the exact behaviour depends on the relative position of the interfering source to the desired source, as can be seen from (50) and (III-C).
VI-A2 Binaural Cue Preservation of Background Noise
For different frequencies, Figure 6 depicts the input MSC in (27) of the noise component (Input) and the output MSC in (27) of the noise component for the BLCMV in (46) for different values of the interference scaling parameter , the BMVDR-N in (53) for different values of the mixing parameter and the BLCMV-N for different values of the mixing parameter and the interference scaling parameter . Although the BLCMV is not designed to preserve the MSC of the noise component, it can be observed that an output MSC smaller than 1 is obtained, especially for large values of [14]. However, since the output MSC of the noise component depends on the relative position of the interfering source to the desired source, it cannot be easily controlled. Since the BMVDR-N mixes the output signals of the BMVDR with the noisy reference microphone signals, it can be observed that the output MSC of the noise component is smaller than 1, and for the MSC is perfectly preserved (but no beamforming is applied). For the BLCMV-N, it can be observed that both and influence the output MSC of the noise component, as discussed in Section V-C. For , the output MSC of the noise component for the BLCMV-N is obviously equal to the output MSC of the noise component for the BLCMV. For a fixed value of , it can be observed that the output MSC of the noise component approaches the input MSC of the noise component for increasing , although it should be realized that perfect preservation of the MSC of the noise component is only possible for (cf. Section V-C).
For several values of the mixing parameter , Figure 7 depicts the MSC error of the noise component for the BLCMV-N and the BMVDR-N, averaged over all frequencies, i.e.,
[TABLE]
with the frequency bin index and the total number of frequency bins. As expected, the BMVDR () yields the largest MSC error of the noise component and increasing the mixing parameter reduces the frequency-averaged MSC error of the noise component for the BMVDR-N [16]. For the considered acoustic scenario, it can be observed for the BLCMV-N that for any value of the interference scaling parameter , increasing the mixing parameter reduces the frequency-averaged MSC error of the noise component compared to the BLCMV (). Further, it can be observed that for small values of the interference scaling parameter , the effect of the mixing parameter is larger than for large values of the interference scaling parameter , for which the frequency-averaged MSC error is relatively small for all values of the mixing parameter . These results clearly show that the mixing parameter in the BLCMV-N enables to control the binaural cues of the background noise.
VI-B Experimental Results Using Reverberant Recordings
For a more realistic evaluation, we compare the performance of the considered binaural beamforming algorithms using reverberant recordings. Similarly to Section VI-A, the experimental setup consists of two hearing aids, each with two microphones, mounted on a HATS in a cafeteria with a reverberation time of approximately [24]. The desired source was again placed at (at a distance of about ), while the interfering source was again placed at (at a distance of about ), see [24] for more details. The desired and interfering source components were generated by convolving clean speech signals with the measured reverberant room impulse responses corresponding to the desired source and interfering source positions. The desired source was a male German speaker, speaking eight sentences with a pause of between the sentences. The interfering source was a male Dutch speaker, speaking seven sentences with a pause of between the sentences. As background noise we used realistic recordings [24], consisting of multi-talker babble noise, clacking plates and temporally dominant competing speakers. The used background noise hence clearly differed from the perfectly diffuse noise in Section VI-A. The entire signal had a length of about . The desired source and the background noise were active the entire time, whereas the interfering source only became active after about . The desired source component, the interfering source component and the noise component were mixed at an input SNR of and input SIR of in the right reference microphone. Again, we chose the front microphone on each hearing aid as reference microphone.
The processing was performed at a sampling rate of in the STFT domain with a frame length of samples and a square-root Hann window with overlap. We used an oracle voice activity detector (i.e., using the desired source and interfering source signals) to estimate the noise covariance matrix , the undesired covariance matrix (interfering source plus background noise) and (desired source plus background noise) over the entire signal. All binaural beamforming algorithms were implemented using relative transfer function (RTF) vectors [25], relating the ATF vectors in (4) to the reference microphones. Using the covariance whitening method (see [14, 26] for further details) the RTF vectors of the desired source and the interfering source were estimated based on generalised eigenvalue decomposition of and or and , respectively. The mixing parameter was set to and the interference scaling parameter was set to .
As objective performance measures for noise and interference reduction performance, we used the left and the right SNR improvement (, ) and the left and the right SIR improvement (, ). As objective performance measure for binaural cue preservation of the background noise we used the frequency-averaged MSC error of the noise component () as defined in (91). All objective performance measures were computed using the reference microphone signals and the output signals of all considered algorithms. Table I presents the objective performance measures for all considered algorithms.
VI-B1 Noise and Interference Reduction Performance
In terms of noise reduction performance, it can be observed that – as expected – the BMVDR yields the highest SNR improvement ( for the left and for the right side). All other algorithms yield a lower SNR improvement, for the BLCMV due to the additional constraint for the interfering source, for the BMVDR-N due to the mixing with the noisy reference microphone signals, and for the BLCMV-N due to both effects. The partial noise estimation for the BLCMV-N seems to result in a smaller drop in noise reduction performance compared to the BLCMV ( for the left side, for the right side) than for the BMVDR-N compared to the BMVDR ( for the left side, for the right side). Please note that both for the BMVDR-N as well as for the BLCMV-N this drop in noise reduction performance depends on the relative position of the interfering source to the desired source.
In terms of interference reduction performance, it can be observed that both the BLCMV and the BLCMV-N approximately lead to the same SIR improvement (for the left and the right side), which is in line with the theoretical SIR improvement in (43) and (79), i.e., 10.5\text{,}\mathrm{dB}$$. The fact that this theoretical SIR improvement is not reached and the fact that the SIR improvements for the BLCMV and BLCMV-N are not exactly the same is due to estimation errors in the covariance matrices, which was also already noted in [14, 17]. In addition, it can be observed that the BMVDR and BMVDR-N lead to very low (even negative) SIR improvements, which is presumably due to the fact that the interfering source is relatively close to the desired source.
VI-B2 Binaural Cue Preservation of Background Noise
As expected, the BMVDR yields the largest MSC error of the noise component . As discussed in Section III-B, the output MSC of the noise component for the BLCMV is typically smaller than 1, hence leading to a smaller MSC error compared to the BMVDR. Due to the mixing with the noisy reference microphone signals, both the BMVDR-N and the BLCMV-N yield a much smaller MSC error of the noise component than the BMVDR and the BLCMV, where the MSC error is slightly smaller for the BMVDR-N than for the BLCMV-N.
In conclusion, the objective performance measures show that the BLCMV-N leads to a very similar interference reduction as the BLCMV, while providing a trade-off between noise reduction performance (slightly worse than the BLCMV) and binaural cue preservation of the background noise (much better than the BLCMV).
VI-C Perceptual Listenting Test
To further investigate the spatial impression of the different output signal components for the four considered algorithms, we conducted a perceptual listening test similarly to [21]. The desired source was now placed at and the interfering source was placed at , in order to enhance the perceived spatial differences between both sources. The desired source component, the interfering source component and the noise component were mixed at an input SNR of and input SIR of in the right reference microphone. Thirteen self-reported normal-hearing subjects participated in the perceptual listening test, where none of the authors participated. All subjects can be considered expert listeners, i.e., they were familiar with similar perceptual listening tests, and gave informed consent. The listening test was conducted in a sound proof listening booth using an RME Fireface UCX sound card with Sennheiser HD 580 headphones.
Using a procedure similar to the MUlti-Stimulus Test with Hidden Reference and Anchor (MUSHRA) [27], the task was to rate the perceived spatial difference with respect to a reference signal. For a coherent source (e.g., interfering source), this corresponds to rating differences in perceived source location, whereas for a diffuse noise field this corresponds to rating differences in perceived diffuseness. A score of 0 is associated with a large perceived spatial difference, whereas a score of 100 is associated with no perceived spatial difference. As reference signal we used the (unprocessed) reference microphone signals, while as anchor signal we used the left reference microphone signal, played back to both ears. The anchor signal was hence a monaural signal with no binaural cues, which is perceived in the center of the head.
We conducted three evaluations, where only some components were active in the output signals, the reference signal and the anchor signal. In the first evaluation, only the desired source component and the interfering source component (i.e., no noise component) were active and the task was to rate the spatial difference for the interfering source. In the second evaluation, only the desired source component and the noise component (i.e., no interfering source component) were active and the task was to rate the spatial difference for the background noise. In the third evaluation, all signal components were active and the task was to rate the spatial difference for the interfering source and the background noise simultaneously. To familiarize the subjects with the tasks and the sound material, a training round was performed. Audio samples for all binaural beamforming algorithms and the unprocessed input signals are available online (see https://uol.de/en/sigproc/research/audio-demos/binaural-noise-reduction/blcmv-n-beamformer).
The MUSHRA scores for the three evaluations are shown in Figure 8. A one-way repeated-measures ANOVA was performed. The analysis revealed a significant within-subjects effect for all three evaluations. Hence, post-hoc comparison t-tests with Bonferroni correction were performed [28].
Interfering source
The within-subjects effect was significant [, , Greenhouse-Geisser correction]. As expected, the BLCMV and the BLCMV-N preserved the spatial impression of the interfering source significantly better than the BMVDR and the BMVDR-N (). The BMVDR-N performed significantly better than the BMVDR (), which is not unexpected since the interfering source component is also mixed with the mixing paremter . No significant difference was found between the BLCMV and the BLCMV-N ().
Background noise
The within-subjects effect was significant [, , Greenhouse-Geisser correction]. As expected, the BMVDR-N and the BLCMV-N, both using partial noise estimation, preserved the spatial impression of the background noise significantly better than the BMVDR and the BLCMV (). No significant difference was found between the BMVDR-N and the BLCMV-N () and between the BMVDR and BLCMV ().
Complete acoustic scene
The within-subjects effect was significant [, , Greenhouse-Geisser correction]. In terms of preservation of the spatial impression of the complete acoustic scene, the BMVDR-N scored significantly higher than the BMVDR (), the BLCMV scored significantly higher than the BMVDR-N (), and the proposed BLCMV-N scored significantly higher than the BLCMV ().
In summary, the results of the listening test showed that the BLCMV-N is capable of preserving the spatial impression of an interfering source and background noise in a realistic acoustic scenario, outperforming all other considered binaural beamforming algorithms in terms of spatial impression.
VII Conclusions
In this paper we proposed the BLCMV-N, merging the advantages of the BLCMV and the BMVDR-N, i.e., preserving the binaural cues of the interfering source and controlling the reduction of the interfering source as well as the binaural cues of the background noise. We showed that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We provided a theoretical comparison between the BMVDR, the BLCMV, the BMVDR-N and the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. The obtained analytical expressions were first validated using measured anechoic acoustic transfer functions. Experimental results using recorded signals in a realistic reverberant environment showed that the BLCMV-N leads to a very similar interference reduction as the BLCMV, while providing a trade-off between noise reduction performance (slightly worse than the BLCMV) and binaural cue preservation of the background noise (much better than the BLCMV). In addition, the results of a perceptual listening test with 13 normal-hearing participants showed that the proposed BLCMV-N is capable of preserving the spatial impression of an interfering source and background noise in a realistic acoustic scenario, outperforming all other considered binaural beamforming algorithms in terms of spatial impression.
Appendix A Derivation of the BLCMV-N
Using (4), (6) and (39), the constrained optimization problem in (54) can be reformulated as
[TABLE]
This constrained optimization problem can be solved using the method of Lagrange multipliers, where the Lagrangian function is given by
[TABLE]
with denoting the 2-dimensional vector of Lagrangian multipliers. Setting the gradient with respect to
[TABLE]
equal to yields
[TABLE]
Substituting (95) into the constraint and solving for the Lagrangian multiplier yields
[TABLE]
Substituting (96) into (95), the solution to (54) is given by
[TABLE]
where, using (39),
[TABLE]
Appendix B Output noise PSD for the BLCMV-N
Using (67) in (16) with instead of , the output PSD of the noise component for the BLCMV-N is given by
[TABLE]
Using (64) and (66), the components in (99) are given by [14]
[TABLE]
Substituting (B) in (99) yields
[TABLE]
with defined in (V-A). Similarly, it can be shown that
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] V. Hamacher, U. Kornagel, T. Lotter, and H. Puder, “Binaural signal processing in hearing aids: Technologies and algorithms,” in Advances in Digital Speech Transmission . New York, NY, USA: Wiley, 2008, pp. 401–429.
- 2[2] S. Doclo, W. Kellermann, S. Makino, and S. E. Nordholm, “Multichannel signal enhancement algorithms for assisted listening devices: Exploiting spatial diversity using multiple microphones,” IEEE Signal Processing Magazine , vol. 32, no. 2, pp. 18–30, Mar. 2015.
- 3[3] S. Doclo, S. Gannot, D. Marquardt, and E. Hadad, “Binaural speech processing with application to hearing devices,” in Audio Source Separation and Speech Enhancement . Wiley, 2018, ch. 18, pp. 413–442.
- 4[4] J. Blauert, Spatial hearing: the psychophysics of human sound localization . Cambridge, Mass. MIT Press, 1997.
- 5[5] K. Kurozumi and K. Ohgushi, “The relationship between the cross-correlation coefficient of two-channel acoustic signals and sound image quality,” The Journal of the Acoustical Society of America , vol. 74, no. 6, pp. 1726–1733, Dec. 1983.
- 6[6] A. W. Bronkhorst and R. Plomp, “The effect of head-induced interaural time and level differences on speech intelligibility in noise,” The Journal of the Acoustical Society of America , vol. 83, no. 4, pp. 1508–1516, Apr. 1988.
- 7[7] M. L. Hawley, R. Y. Litovsky, and J. F. Culling, “The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer,” The Journal of the Acoustical Society of America , vol. 115, no. 2, pp. 833–843, Feb. 2004.
- 8[8] D. P. Welker, J. E. Greenberg, J. G. Desloge, and P. M. Zurek, “Microphone-array hearing aids with binaural output. II. A two-microphone adaptive system,” IEEE Transactions on Speech and Audio Processing , vol. 5, no. 6, pp. 543–551, 1997.
