MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
He Zhang, Wenqian Cui, Haoning Xu, Xiaohui Li, Lei Zhu, Haoli Bai, Shaohua Ma, Irwin King

TL;DR
MTR-DuplexBench is a comprehensive benchmark designed to evaluate full-duplex speech language models across multiple conversation rounds, addressing existing evaluation gaps in multi-turn dialogue settings.
Contribution
The paper introduces MTR-DuplexBench, a novel benchmark that segments dialogues and evaluates multiple aspects of FD-SLMs, enabling more thorough multi-round assessments.
Findings
Current FD-SLMs struggle with multi-round consistency.
Benchmark reveals challenges in maintaining performance across rounds.
Evaluation covers conversational features, dialogue quality, safety, and instruction following.
Abstract
Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. However, existing benchmarks primarily focus on evaluating single-round interactions, neglecting the complexities of multi-round communication. Evaluating FD-SLMs in multi-round settings poses significant challenges, including blurred turn boundaries in communication and context inconsistency during model inference. Also, existing benchmarks often focus solely on evaluating conversational features, neglecting other critical aspects. To address these gaps, we introduce MTR-DuplexBench, a novel benchmark designed for a comprehensive multi-round evaluation of FD-SLMs. MTR-DuplexBench not only segments continuous full-duplex dialogues into discrete turns for turn-by-turn assessment but also incorporates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
