Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models

Sangmin Song; Juhwan Choi; JungMin Yun; YoungBin Kim

arXiv:2506.10504·cs.CL·June 13, 2025

Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models

Sangmin Song, Juhwan Choi, JungMin Yun, YoungBin Kim

PDF

Open Access 1 Video

TL;DR

This paper evaluates large language models' ability to track dialogue states in multi-user conversations, revealing significant performance challenges and emphasizing the need for improved models in realistic multi-party interactions.

Contribution

It introduces a novel methodology for assessing multi-user dialogue state tracking in LLMs using extended datasets and speech act theory-based utterance generation.

Findings

01

LLMs show a significant performance drop in multi-user DST compared to single-user scenarios.

02

The study highlights current limitations of LLMs in multi-party dialogue understanding.

03

Results suggest the necessity for developing specialized models for multi-user dialogue tracking.

Abstract

Large language models (LLMs) have demonstrated remarkable performance in zero-shot dialogue state tracking (DST), reducing the need for task-specific training. However, conventional DST benchmarks primarily focus on structured user-agent conversations, failing to capture the complexities of real-world multi-user interactions. In this study, we assess the robustness of LLMs in multi-user DST while minimizing dataset construction costs. Inspired by recent advances in LLM-based data annotation, we extend an existing DST dataset by generating utterances of a second user based on speech act theory. Our methodology systematically incorporates a second user's utterances into conversations, enabling a controlled evaluation of LLMs in multi-user settings. Experimental results reveal a significant performance drop compared to single-user DST, highlighting the limitations of current LLMs in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models· underline

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions

MethodsFocus · Dynamic Sparse Training