The Airbus Air Traffic Control speech recognition 2018 challenge:   towards ATC automatic transcription and call sign detection

Thomas Pellegrini; J\'er\^ome Farinas; Estelle Delpech; Fran\c{c}ois; Lancelot

arXiv:1810.12614·cs.SD·March 11, 2020

The Airbus Air Traffic Control speech recognition 2018 challenge: towards ATC automatic transcription and call sign detection

Thomas Pellegrini, J\'er\^ome Farinas, Estelle Delpech, Fran\c{c}ois, Lancelot

PDF

TL;DR

This paper reports on the 2018 Airbus challenge for automatic ATC speech transcription and call sign detection, highlighting the difficulty of processing diverse, noisy ATC speech and presenting top system performances.

Contribution

It introduces a benchmark dataset and evaluation for ATC speech recognition and call sign detection, and analyzes the challenges and results of participating systems.

Findings

01

Best WER achieved was 7.62%

02

Call sign detection F1-score was 82.41%

03

Transcribing pilots' speech is twice as difficult as controllers'

Abstract

In this paper, we describe the outcomes of the challenge organized and run by Airbus and partners in 2018. The challenge consisted of two tasks applied to Air Traffic Control (ATC) speech in English: 1) automatic speech-to-text transcription, 2) call sign detection (CSD). The registered participants were provided with 40 hours of speech along with manual transcriptions. Twenty-two teams submitted predictions on a five hour evaluation set. ATC speech processing is challenging for several reasons: high speech rate, foreign-accented speech with a great diversity of accents, noisy communication channels. The best ranked team achieved a 7.62% Word Error Rate and a 82.41% CSD F1-score. Transcribing pilots' speech was found to be twice as harder as controllers' speech. Remaining issues towards solving ATC ASR are also discussed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Hand Gesture Recognition Systems