Automatic Speech Recognition Benchmark for Air-Traffic Communications

Juan Zuluaga-Gomez; Petr Motlicek; Qingran Zhan; Karel Vesely; and Rudolf Braun

arXiv:2006.10304·cs.CL·August 14, 2020·5 cites

Automatic Speech Recognition Benchmark for Air-Traffic Communications

Juan Zuluaga-Gomez, Petr Motlicek, Qingran Zhan, Karel Vesely, and Rudolf Braun

PDF

Open Access 3 Repos

TL;DR

This paper presents a benchmark of state-of-the-art ASR models trained on extensive air-traffic control speech data, demonstrating high accuracy and robustness across accents, with potential to enhance ATC communication systems.

Contribution

It introduces a large-scale ATC speech dataset and evaluates multiple ASR models, showing improved performance and addressing accent-related challenges in air-traffic communication.

Findings

01

Achieved an average WER of 7.75% across four datasets.

02

Training with byte-pair encoding reduced WER by 35%.

03

Cross-accent performance improved with increased data volume.

Abstract

Advances in Automatic Speech Recognition (ASR) over the last decade opened new areas of speech-based automation such as in Air-Traffic Control (ATC) environment. Currently, voice communication and data links communications are the only way of contact between pilots and Air-Traffic Controllers (ATCo), where the former is the most widely used and the latter is a non-spoken method mandatory for oceanic messages and limited for some domestic issues. ASR systems on ATCo environments inherit increasing complexity due to accents from non-English speakers, cockpit noise, speaker-dependent biases, and small in-domain ATC databases for training. Hereby, we introduce CleanSky EC-H2020 ATCO2, a project that aims to develop an ASR-based platform to collect, organize and automatically pre-process ATCo speech-data from air space. This paper conveys an exploratory benchmark of several state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing