A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar,, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe

TL;DR
This paper investigates how combining various pre-trained models, including self-supervised and supervised models, can enhance spoken language understanding performance across different benchmarks, especially in low-resource settings.
Contribution
It systematically evaluates the impact of different pre-training strategies and model combinations on SLU tasks, highlighting the effectiveness of self-supervised models.
Findings
Self-supervised pre-trained models outperform supervised models in SLU.
Pre-trained LM benefits Sentiment Analysis tasks.
Pre-trained speech models improve Named Entity Recognition.
Abstract
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of unpaired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
