Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech Recognition in Real-World Applications -- A Case Study on German Oral History Interviews
Michael Gref, Oliver Walter, Christoph Schmidt, Sven Behnke, Joachim, K\"ohler

TL;DR
This paper presents a multi-staged cross-lingual acoustic model adaptation method that significantly improves speech recognition accuracy in a challenging German oral history interview domain by leveraging large-scale multi-language data.
Contribution
It introduces a novel multi-staged cross-lingual adaptation approach that effectively utilizes multi-domain and multi-language data for robust speech recognition.
Findings
Achieved over 30% relative word error rate reduction in German oral history interviews.
Outperformed models trained solely on target domain data by 6-7%.
Demonstrated effectiveness of multi-staged cross-lingual adaptation in real-world scenarios.
Abstract
While recent automatic speech recognition systems achieve remarkable performance when large amounts of adequate, high quality annotated speech data is used for training, the same systems often only achieve an unsatisfactory result for tasks in domains that greatly deviate from the conditions represented by the training data. For many real-world applications, there is a lack of sufficient data that can be directly used for training robust speech recognition systems. To address this issue, we propose and investigate an approach that performs a robust acoustic model adaption to a target domain in a cross-lingual, multi-staged manner. Our approach enables the exploitation of large-scale training data from other domains in both the same and other languages. We evaluate our approach using the challenging task of German oral history interviews, where we achieve a relative reduction of the word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
