Automatic Speech Recognition Benchmark for Air-Traffic Communications
Juan Zuluaga-Gomez, Petr Motlicek, Qingran Zhan, Karel Vesely, and Rudolf Braun

TL;DR
This paper presents a benchmark of state-of-the-art ASR models trained on extensive air-traffic control speech data, demonstrating high accuracy and robustness across accents, with potential to enhance ATC communication systems.
Contribution
It introduces a large-scale ATC speech dataset and evaluates multiple ASR models, showing improved performance and addressing accent-related challenges in air-traffic communication.
Findings
Achieved an average WER of 7.75% across four datasets.
Training with byte-pair encoding reduced WER by 35%.
Cross-accent performance improved with increased data volume.
Abstract
Advances in Automatic Speech Recognition (ASR) over the last decade opened new areas of speech-based automation such as in Air-Traffic Control (ATC) environment. Currently, voice communication and data links communications are the only way of contact between pilots and Air-Traffic Controllers (ATCo), where the former is the most widely used and the latter is a non-spoken method mandatory for oceanic messages and limited for some domestic issues. ASR systems on ATCo environments inherit increasing complexity due to accents from non-English speakers, cockpit noise, speaker-dependent biases, and small in-domain ATC databases for training. Hereby, we introduce CleanSky EC-H2020 ATCO2, a project that aims to develop an ASR-based platform to collect, organize and automatically pre-process ATCo speech-data from air space. This paper conveys an exploratory benchmark of several state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing
