UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets

Zhichao Sheng; Shilin Zhou; Chen Gong; Zhenghua Li

arXiv:2507.12951·eess.AS·July 18, 2025

UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets

Zhichao Sheng, Shilin Zhou, Chen Gong, Zhenghua Li

PDF

Open Access

TL;DR

UniSLU introduces a unified framework for multiple spoken language understanding tasks, leveraging heterogeneous datasets and a generative approach to improve performance and task interaction in speech-based applications.

Contribution

The paper presents a novel unified architecture for multiple SLU tasks, enabling joint modeling and better utilization of diverse datasets, which was not addressed in prior separate-task models.

Findings

01

Achieves superior SLU performance over benchmark methods

02

Effectively models multiple SLU tasks within a single architecture

03

Demonstrates improved task interaction and data utilization

Abstract

Spoken Language Understanding (SLU) plays a crucial role in speech-centric multimedia applications, enabling machines to comprehend spoken language in scenarios such as meetings, interviews, and customer service interactions. SLU encompasses multiple tasks, including Automatic Speech Recognition (ASR), spoken Named Entity Recognition (NER), and spoken Sentiment Analysis (SA). However, existing methods often rely on separate model architectures for individual tasks such as spoken NER and SA, which increases system complexity, limits cross-task interaction, and fails to fully exploit heterogeneous datasets available across tasks. To address these limitations, we propose UniSLU, a unified framework that jointly models multiple SLU tasks within a single architecture. Specifically, we propose a unified representation for diverse SLU tasks, enabling full utilization of heterogeneous datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems