ATCSpeechNet: A multilingual end-to-end speech recognition framework for   air traffic control systems

Yi Lin; Bo Yang; Linchao Li; Dongyue Guo; Jianwei Zhang; Hu Chen; Yi; Zhang

arXiv:2102.08535·cs.CL·February 18, 2021

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Yi Lin, Bo Yang, Linchao Li, Dongyue Guo, Jianwei Zhang, Hu Chen, Yi, Zhang

PDF

TL;DR

ATCSpeechNet is a novel multilingual end-to-end speech recognition framework for air traffic control that effectively utilizes unlabeled data and achieves high accuracy with minimal labeled samples.

Contribution

The paper introduces ATCSpeechNet, integrating speech representation learning with unsupervised pre-training for multilingual ATC speech recognition in a single end-to-end model.

Findings

01

Achieves 4.20% label error rate on 58-hour corpus

02

Over 100% relative performance improvement over baseline

03

Effective with small labeled datasets

Abstract

In this paper, a multilingual end-to-end framework, called as ATCSpeechNet, is proposed to tackle the issue of translating communication speech into human-readable text in air traffic control (ATC) systems. In the proposed framework, we focus on integrating the multilingual automatic speech recognition (ASR) into one model, in which an end-to-end paradigm is developed to convert speech waveform into text directly, without any feature engineering or lexicon. In order to make up for the deficiency of the handcrafted feature engineering caused by ATC challenges, a speech representation learning (SRL) network is proposed to capture robust and discriminative speech representations from the raw wave. The self-supervised training strategy is adopted to optimize the SRL network from unlabeled data, and further to predict the speech features, i.e., wave-to-feature. An end-to-end architecture is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.