Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems
Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr, Motlicek, Karel Vesel\'y, Martin Kocour, Igor Sz\"oke

TL;DR
This paper introduces a semi-supervised learning approach that incorporates air-surveillance contextual knowledge to improve automatic speech recognition accuracy for air traffic control communications, especially for unseen domains.
Contribution
It presents a novel two-step method combining WFST-based contextual knowledge representation with semi-supervised learning to enhance callsign recognition in ATC speech recognition systems.
Findings
Achieved 32.1% relative improvement in CA-WER with SSL.
Further improved CA-WER by 17.5% using contextual knowledge during SSL.
Effective in recognizing unseen domain data from new airports.
Abstract
Air traffic management and specifically air-traffic control (ATC) rely mostly on voice communications between Air Traffic Controllers (ATCos) and pilots. In most cases, these voice communications follow a well-defined grammar that could be leveraged in Automatic Speech Recognition (ASR) technologies. The callsign used to address an airplane is an essential part of all ATCo-pilot communications. We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates at recognizing the part of the utterance that contains the callsign. Initially, we represent in a WFST the contextual knowledge (i.e. air-surveillance data) of an ATCo-pilot communication. Then, during Semi-Supervised Learning (SSL) the contextual knowledge is added by second-pass decoding (i.e. lattice re-scoring). Results show that `unseen domains' (e.g. data from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
