UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets
Zhichao Sheng, Shilin Zhou, Chen Gong, Zhenghua Li

TL;DR
UniSLU introduces a unified framework for multiple spoken language understanding tasks, leveraging heterogeneous datasets and a generative approach to improve performance and task interaction in speech-based applications.
Contribution
The paper presents a novel unified architecture for multiple SLU tasks, enabling joint modeling and better utilization of diverse datasets, which was not addressed in prior separate-task models.
Findings
Achieves superior SLU performance over benchmark methods
Effectively models multiple SLU tasks within a single architecture
Demonstrates improved task interaction and data utilization
Abstract
Spoken Language Understanding (SLU) plays a crucial role in speech-centric multimedia applications, enabling machines to comprehend spoken language in scenarios such as meetings, interviews, and customer service interactions. SLU encompasses multiple tasks, including Automatic Speech Recognition (ASR), spoken Named Entity Recognition (NER), and spoken Sentiment Analysis (SA). However, existing methods often rely on separate model architectures for individual tasks such as spoken NER and SA, which increases system complexity, limits cross-task interaction, and fails to fully exploit heterogeneous datasets available across tasks. To address these limitations, we propose UniSLU, a unified framework that jointly models multiple SLU tasks within a single architecture. Specifically, we propose a unified representation for diverse SLU tasks, enabling full utilization of heterogeneous datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
