Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen, Livescu

TL;DR
This paper demonstrates that joint speech-text models can be fine-tuned with minimal speech data to achieve competitive spoken language understanding performance, leveraging shared representations learned during pre-training.
Contribution
It introduces a method to transfer pre-trained speech-text models to speech understanding tasks with very limited data, showing significant data efficiency improvements.
Findings
Models fine-tuned on text transfer effectively to speech data.
Shared representations enable low-data speech understanding.
Layer analysis reveals task-agnostic and task-specific features.
Abstract
Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks. By employing a pre-trained speech-text model, we find that models fine-tuned on text can be effectively transferred to speech testing data. With as little as 1 hour of labeled speech data, our proposed approach achieves comparable performance on spoken language understanding tasks (specifically, sentiment analysis and named entity recognition) when compared to previous methods using speech-only pre-trained models fine-tuned on 10 times more data. Beyond the proof-of-concept study, we also analyze the latent representations. We find that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
MethodsALIGN
