On the Use of External Data for Spoken Named Entity Recognition
Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han

TL;DR
This paper explores leveraging external speech and text data to improve low-resource spoken named entity recognition, demonstrating that such approaches can significantly outperform pre-trained models alone.
Contribution
It introduces methods for utilizing external data beyond self-supervised pre-training to enhance spoken NER performance in resource-limited settings.
Findings
External data approaches improve F1 scores by up to 16%.
End-to-end models outperform pipeline models with external data.
End-to-end models focus more on NER-specific words.
Abstract
Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labels. Given the complexity of such tasks, good performance might be expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with limited labeled data. In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task? We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline (speech recognition followed by text NER model) approaches. We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
