Contextual Semi-Supervised Learning: An Approach To Leverage   Air-Surveillance and Untranscribed ATC Data in ASR Systems

Juan Zuluaga-Gomez; Iuliia Nigmatulina; Amrutha Prasad; Petr; Motlicek; Karel Vesel\'y; Martin Kocour; Igor Sz\"oke

arXiv:2104.03643·cs.CL·August 30, 2021·6 cites

Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems

Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr, Motlicek, Karel Vesel\'y, Martin Kocour, Igor Sz\"oke

PDF

Open Access

TL;DR

This paper introduces a semi-supervised learning approach that incorporates air-surveillance contextual knowledge to improve automatic speech recognition accuracy for air traffic control communications, especially for unseen domains.

Contribution

It presents a novel two-step method combining WFST-based contextual knowledge representation with semi-supervised learning to enhance callsign recognition in ATC speech recognition systems.

Findings

01

Achieved 32.1% relative improvement in CA-WER with SSL.

02

Further improved CA-WER by 17.5% using contextual knowledge during SSL.

03

Effective in recognizing unseen domain data from new airports.

Abstract

Air traffic management and specifically air-traffic control (ATC) rely mostly on voice communications between Air Traffic Controllers (ATCos) and pilots. In most cases, these voice communications follow a well-defined grammar that could be leveraged in Automatic Speech Recognition (ASR) technologies. The callsign used to address an airplane is an essential part of all ATCo-pilot communications. We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates at recognizing the part of the utterance that contains the callsign. Initially, we represent in a WFST the contextual knowledge (i.e. air-surveillance data) of an ATCo-pilot communication. Then, during Semi-Supervised Learning (SSL) the contextual knowledge is added by second-pass decoding (i.e. lattice re-scoring). Results show that `unseen domains' (e.g. data from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques