Utilizing Multimodal Data for Edge Case Robust Call-sign Recognition and Understanding
Alexander Blatt, Dietrich Klakow

TL;DR
This paper introduces a multimodal architecture for call-sign recognition in air-traffic control that improves robustness in challenging scenarios like noisy or partial transcripts, achieving up to 15% better edge case performance.
Contribution
The paper proposes the CCR multimodal model and CallSBERT architecture, enhancing edge case robustness and training efficiency for ATC call-sign recognition tasks.
Findings
Up to 15% improvement in edge case performance.
CallSBERT is more robust and faster to fine-tune.
Optimizing for edge cases increases overall accuracy.
Abstract
Operational machine-learning based assistant systems must be robust in a wide range of scenarios. This hold especially true for the air-traffic control (ATC) domain. The robustness of an architecture is particularly evident in edge cases, such as high word error rate (WER) transcripts resulting from noisy ATC recordings or partial transcripts due to clipped recordings. To increase the edge-case robustness of call-sign recognition and understanding (CRU), a core tasks in ATC speech processing, we propose the multimodal call-sign-command recovery model (CCR). The CCR architecture leads to an increase in the edge case performance of up to 15%. We demonstrate this on our second proposed architecture, CallSBERT. A CRU model that has less parameters, can be fine-tuned noticeably faster and is more robust during fine-tuning than the state of the art for CRU. Furthermore, we demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems
