FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems

Borui Liao; Yulong Xu; Jiao Ou; Kaiyuan Yang; Weihua Jian; Pengfei Wan; Di Zhang

arXiv:2502.13472·cs.CL·May 30, 2025

FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems

Borui Liao, Yulong Xu, Jiao Ou, Kaiyuan Yang, Weihua Jian, Pengfei Wan, Di Zhang

PDF

Open Access

TL;DR

FlexDuo introduces a modular full-duplex control system for speech dialogue that reduces interruptions and improves response accuracy by filtering noise and managing dialogue states, enhancing natural human-machine interactions.

Contribution

The paper presents FlexDuo, a novel plug-and-play full-duplex control module with an explicit Idle state, decoupling control from dialogue systems for improved performance and flexibility.

Findings

01

Reduces false interruption rate by 24.9%.

02

Improves response accuracy by 7.6%.

03

Outperforms VAD-controlled systems in Chinese and English dialogues.

Abstract

Full-Duplex Speech Dialogue Systems (Full-Duplex SDS) have significantly enhanced the naturalness of human-machine interaction by enabling real-time bidirectional communication. However, existing approaches face challenges such as difficulties in independent module optimization and contextual noise interference due to highly coupled architectural designs and oversimplified binary state modeling. This paper proposes FlexDuo, a flexible full-duplex control module that decouples duplex control from spoken dialogue systems through a plug-and-play architectural design. Furthermore, inspired by human information-filtering mechanisms in conversations, we introduce an explicit Idle state. On one hand, the Idle state filters redundant noise and irrelevant audio to enhance dialogue quality. On the other hand, it establishes a semantic integrity-based buffering mechanism, reducing the risk of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Robotics and Automated Systems