# English Broadcast News Speech Recognition by Humans and Machines

**Authors:** Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan, Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice, Kaiser-Schatzlein, Bern Samko

arXiv: 1904.13258 · 2019-05-01

## TL;DR

This paper evaluates recent deep learning speech recognition techniques on broadcast news, comparing machine performance to human accuracy, and demonstrates that current systems are approaching but still lag behind human recognition levels.

## Contribution

The study applies advanced LSTM and residual network models to broadcast news recognition, showing transferability of techniques from conversational speech and quantifying the gap to human performance.

## Key findings

- Machine WER: 6.5% and 5.9% on two test sets.
- Human WER: 3.6% and 2.8%, indicating room for improvement.
- Deep learning models are nearing human-level accuracy.

## Abstract

With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate the usefulness of these proposed techniques on broadcast news (BN), a similar challenging task. We also perform a set of recognition measurements to understand how close the achieved automatic speech recognition results are to human performance on this task. On two publicly available BN test sets, DEV04F and RT04, our speech recognition system using LSTM and residual network based acoustic models with a combination of n-gram and neural network language models performs at 6.5% and 5.9% word error rate. By achieving new performance milestones on these test sets, our experiments show that techniques developed on other related tasks, like CTS, can be transferred to achieve similar performance. In contrast, the best measured human recognition performance on these test sets is much lower, at 3.6% and 2.8% respectively, indicating that there is still room for new techniques and improvements in this space, to reach human performance levels.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.13258/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1904.13258/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1904.13258/full.md

---
Source: https://tomesphere.com/paper/1904.13258