MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
Yuezhang Peng, Chonghao Cai, Ziang Liu, Shuai Fan, Sheng Jiang, Hua Xu, Yuxin Liu, Qiguang Chen, Kele Xu, Yao Li, Sheng Wang, Libo Qin, Xie Chen

TL;DR
This paper introduces MAC-SLU, a challenging multi-intent spoken language understanding dataset for automotive cabins, and benchmarks various large language and audio models, revealing current limitations and potential of different approaches.
Contribution
The paper presents MAC-SLU, a new complex dataset for automotive SLU, and provides a comprehensive benchmark of LLMs and LALMs, highlighting their strengths and weaknesses.
Findings
LLMs perform well with in-context learning but lag behind supervised fine-tuning.
End-to-end LALMs match pipeline approaches and reduce error propagation.
The dataset increases SLU task difficulty with authentic multi-intent data.
Abstract
Spoken Language Understanding (SLU), which aims to extract user semantics to execute downstream tasks, is a crucial component of task-oriented dialog systems. Existing SLU datasets generally lack sufficient diversity and complexity, and there is an absence of a unified benchmark for the latest Large Language Models (LLMs) and Large Audio Language Models (LALMs). This work introduces MAC-SLU, a novel Multi-Intent Automotive Cabin Spoken Language Understanding Dataset, which increases the difficulty of the SLU task by incorporating authentic and complex multi-intent data. Based on MAC-SLU, we conducted a comprehensive benchmark of leading open-source LLMs and LALMs, covering methods like in-context learning, supervised fine-tuning (SFT), and end-to-end (E2E) and pipeline paradigms. Our experiments show that while LLMs and LALMs have the potential to complete SLU tasks through in-context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Emotion and Mood Recognition
