Speech recognition for air traffic control via feature learning and end-to-end training
Peng Fan, Dongyue Guo, Yi Lin, Bo Yang, Jianwei Zhang

TL;DR
This paper introduces an end-to-end speech recognition system for air traffic control that learns features directly from raw waveforms, improving accuracy on multilingual ATC speech data.
Contribution
The novel approach combines feature learning from raw waveforms with end-to-end training for ATC speech recognition, handling multilingual challenges effectively.
Findings
Achieved 6.9% character error rate on ATCSpeech corpus.
Outperformed baseline models in multilingual ATC speech recognition.
Validated the effectiveness of raw waveform feature learning in complex environments.
Abstract
In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the handcrafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D convolution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from waveform to text. Finally, the multilingual issue in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsConvolution
