# Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by   the Example of German Oral History Interviews

**Authors:** Michael Gref, Christoph Schmidt, Sven Behnke, Joachim K\"ohler

arXiv: 1908.06709 · 2019-08-20

## TL;DR

This paper presents a two-staged acoustic modeling approach combining data augmentation and transfer learning to improve speech recognition in challenging conditions like elderly speech and spontaneous recordings, demonstrated on German oral history interviews.

## Contribution

It introduces a novel two-staged acoustic modeling method that effectively handles limited data and challenging acoustic environments in speech recognition.

## Key findings

- Achieved a 19.3% reduction in word error rate on German oral history interviews.
- Demonstrated robustness to noise, reverberation, and spontaneous speech.
- Validated effectiveness of combined data augmentation and transfer learning.

## Abstract

In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people. We evaluate our approach using the example of German oral history interviews, where a relative average reduction of the word error rate by 19.3% is achieved.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06709/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1908.06709/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1908.06709/full.md

---
Source: https://tomesphere.com/paper/1908.06709