Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models

Guan-Ting Lin; Shih-Yun Shan Kuan; Qirui Wang; Jiachen Lian; Tingle Li; Shinji Watanabe; Hung-yi Lee

arXiv:2507.23159·eess.AS·April 28, 2026

Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models

Guan-Ting Lin, Shih-Yun Shan Kuan, Qirui Wang, Jiachen Lian, Tingle Li, Shinji Watanabe, Hung-yi Lee

PDF

TL;DR

This paper introduces Full-Duplex-Bench v1.5, an automated benchmark for evaluating how speech models handle overlapping speech in dialogue, revealing different response strategies and aiding robust system development.

Contribution

It presents the first comprehensive, open-source benchmark for systematically assessing overlap handling in full-duplex speech models, including diverse scenarios and metrics.

Findings

01

Benchmark reveals two main response strategies: rapid response and floor-holding.

02

Framework supports open-source and commercial models for reproducible evaluation.

03

Analysis of five state-of-the-art agents shows divergent overlap management behaviors.

Abstract

Full-duplex spoken dialogue systems promise to transform human-machine interaction from a rigid, turn-based protocol into a fluid, natural conversation. However, the central challenge to realizing this vision, managing overlapping speech, remains critically under-evaluated. We introduce Full-Duplex-Bench v1.5, the first fully automated benchmark designed to systematically probe how models behave during speech overlap. The benchmark simulates four representative overlap scenarios: user interruption, user backchannel, talking to others, and background speech. Our framework, compatible with open-source and commercial API-based models, provides a comprehensive suite of metrics analyzing categorical dialogue behaviors, stop and response latency, and prosodic adaptation. Benchmarking five state-of-the-art agents reveals two divergent strategies: a responsive approach prioritizing rapid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.