Human and Automatic Speech Recognition Performance on German Oral   History Interviews

Michael Gref; Nike Matthiesen; Christoph Schmidt; Sven Behnke; Joachim; K\"ohler

arXiv:2201.06841·eess.AS·January 19, 2022·1 cites

Human and Automatic Speech Recognition Performance on German Oral History Interviews

Michael Gref, Nike Matthiesen, Christoph Schmidt, Sven Behnke, Joachim, K\"ohler

PDF

Open Access

TL;DR

This study compares human and automatic transcription accuracy on German oral history interviews, revealing a significant gap and demonstrating improvements in machine models through adaptation techniques.

Contribution

It provides the first detailed comparison of human and machine transcription performance on German oral history data and explores model adaptation for improved accuracy.

Findings

01

Human WER estimated at 8.7% for clean interviews

02

Machine models achieved 23.9% WER on noisy data

03

Model adaptation improved accuracy by 5-8%

Abstract

Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analyze and compare transcriptions of three humans on a new oral history data set. We estimate a human word error rate of 8.7% for recent German oral history interviews with clean acoustic conditions. For comparison with recent machine transcription accuracy, we present experiments on the adaptation of an acoustic model achieving near-human performance on broadcast speech. We investigate the influence of different adaptation data on robustness and generalization for clean and noisy oral history…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing