A Comparative Study of Speaker Role Identification in Air Traffic Communication Using Deep Learning Approaches
Dongyue Guo, Jianwei Zhang, Bo Yang, Yi Lin

TL;DR
This paper compares various deep learning methods, including a novel multi-modal network, for identifying speaker roles in air traffic control conversations, achieving high accuracy and robustness.
Contribution
It introduces a multi-modal neural network for speaker role identification in ATC communication, integrating speech and text features with attention mechanisms.
Findings
MMSRINet achieves over 98% accuracy on ATCSpeech corpus.
Multi-modal approach outperforms single-modality methods.
Proposed methods are effective on both seen and unseen data.
Abstract
Automatic spoken instruction understanding (SIU) of the controller-pilot conversations in the air traffic control (ATC) requires not only recognizing the words and semantics of the speech but also determining the role of the speaker. However, few of the published works on the automatic understanding systems in air traffic communication focus on speaker role identification (SRI). In this paper, we formulate the SRI task of controller-pilot communication as a binary classification problem. Furthermore, the text-based, speech-based, and speech and text based multi-modal methods are proposed to achieve a comprehensive comparison of the SRI task. To ablate the impacts of the comparative approaches, various advanced neural network architectures are applied to optimize the implementation of text-based and speech-based methods. Most importantly, a multi-modal speaker role identification network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
