MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

He Zhang; Wenqian Cui; Haoning Xu; Xiaohui Li; Lei Zhu; Haoli Bai; Shaohua Ma; Irwin King

arXiv:2511.10262·cs.CL·April 20, 2026

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

He Zhang, Wenqian Cui, Haoning Xu, Xiaohui Li, Lei Zhu, Haoli Bai, Shaohua Ma, Irwin King

PDF

1 Repo 1 Datasets

TL;DR

MTR-DuplexBench is a comprehensive benchmark designed to evaluate full-duplex speech language models across multiple conversation rounds, addressing existing evaluation gaps in multi-turn dialogue settings.

Contribution

The paper introduces MTR-DuplexBench, a novel benchmark that segments dialogues and evaluates multiple aspects of FD-SLMs, enabling more thorough multi-round assessments.

Findings

01

Current FD-SLMs struggle with multi-round consistency.

02

Benchmark reveals challenges in maintaining performance across rounds.

03

Evaluation covers conversational features, dialogue quality, safety, and instruction following.

Abstract

Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. However, existing benchmarks primarily focus on evaluating single-round interactions, neglecting the complexities of multi-round communication. Evaluating FD-SLMs in multi-round settings poses significant challenges, including blurred turn boundaries in communication and context inconsistency during model inference. Also, existing benchmarks often focus solely on evaluating conversational features, neglecting other critical aspects. To address these gaps, we introduce MTR-DuplexBench, a novel benchmark designed for a comprehensive multi-round evaluation of FD-SLMs. MTR-DuplexBench not only segments continuous full-duplex dialogues into discrete turns for turn-by-turn assessment but also incorporates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhangHe0918/MTR-DuplexBench
github

Datasets

Jeff0918/MTR-DuplexBench
dataset· 2.8k dl
2.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.