# deHuBERT: Disentangling Noise in a Self-supervised Model for Robust   Speech Recognition

**Authors:** Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Zhao Yang, Jinjie Ni, Chong, Zhang, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma

arXiv: 2302.14597 · 2023-03-01

## TL;DR

deHuBERT is a novel self-supervised training framework that enhances speech recognition robustness by disentangling noise from speech representations, improving performance in noisy conditions without sacrificing accuracy on clean data.

## Contribution

The paper introduces deHuBERT, a new training method that applies auxiliary losses to produce noise-agnostic speech embeddings, advancing robustness in noisy environments.

## Key findings

- Improved speech recognition accuracy in noisy conditions.
- Maintains performance on clean speech data.
- Effective against unseen noise types.

## Abstract

Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world applications. In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow's redundancy-reduction principle. The new framework improves the HuBERT training algorithm by introducing auxiliary losses that drive the self- and cross-correlation matrix between pairwise noise-distorted embeddings towards identity matrix. This encourages the model to produce noise-agnostic speech representations. With this method, we report improved robustness in noisy environments, including unseen noises, without impairing the performance on the clean set.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14597/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14597/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/2302.14597/full.md

---
Source: https://tomesphere.com/paper/2302.14597