A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models   for Spoken Language Understanding

Yifan Peng; Siddhant Arora; Yosuke Higuchi; Yushi Ueda; Sujay Kumar,; Karthik Ganesan; Siddharth Dalmia; Xuankai Chang; Shinji Watanabe

arXiv:2211.05869·cs.CL·November 17, 2022

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar,, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe

PDF

Open Access

TL;DR

This paper investigates how combining various pre-trained models, including self-supervised and supervised models, can enhance spoken language understanding performance across different benchmarks, especially in low-resource settings.

Contribution

It systematically evaluates the impact of different pre-training strategies and model combinations on SLU tasks, highlighting the effectiveness of self-supervised models.

Findings

01

Self-supervised pre-trained models outperform supervised models in SLU.

02

Pre-trained LM benefits Sentiment Analysis tasks.

03

Pre-trained speech models improve Named Entity Recognition.

Abstract

Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of unpaired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling