# Batch-normalized joint training for DNN-based distant speech recognition

**Authors:** Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

arXiv: 1703.08471 · 2017-03-27

## TL;DR

This paper introduces a batch-normalized joint training method for deep neural networks in distant speech recognition, improving robustness in adverse acoustic conditions by better aligning speech enhancement and recognition modules.

## Contribution

It proposes a fully batch-normalized architecture for joint training of speech enhancement and recognition DNNs, addressing non-stationary output distributions during optimization.

## Key findings

- Significantly outperforms other methods in challenging environments
- Improves robustness of distant speech recognition systems
- Effective across various datasets and acoustic conditions

## Abstract

Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits a lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in the last years on both speech enhancement and speech recognition, one potential limitation of state-of-the-art technology lies in composing modules that are not well matched because they are not trained jointly. To address this concern, a promising approach consists in concatenating a speech enhancement and a speech recognition deep neural network and to jointly update their parameters as if they were within a single bigger network. Unfortunately, joint training can be difficult because the output distribution of the speech enhancement system may change substantially during the optimization procedure. The speech recognition module would have to deal with an input distribution that is non-stationary and unnormalized. To mitigate this issue, we propose a joint training approach based on a fully batch-normalized architecture. Experiments, conducted using different datasets, tasks and acoustic conditions, revealed that the proposed framework significantly overtakes other competitive solutions, especially in challenging environments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.08471/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1703.08471/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1703.08471/full.md

---
Source: https://tomesphere.com/paper/1703.08471