DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
Haoyang Zhang, Jun Chen, Donghang Wu, Yuxin Li, Yuxin Zhang, Xiangyu Tony Zhang, Che Liu, Qingjian Lin, Yizhou Peng, Hexin Liu, Eng Siong Chng, Chao Yan, Boyong Wu, Yechang Huang, Xuerui Yang, and Fei Tian

TL;DR
DuplexSLA introduces a native full-duplex speech-language-action model enabling continuous listening, speaking, planning, and tool calling in real-time conversations, advancing dialogue AI capabilities.
Contribution
It presents a novel dual-stream, three-channel model architecture that integrates in-conversation planning and tool calling without external modules.
Findings
Joint decoding of speech and actions on a shared timeline.
Semantic-driven turn-taking control within the backbone.
Constructed DuplexSLA-Bench for comprehensive evaluation.
Abstract
Recent advances in spoken dialogue language models have shifted from turn-based to full-duplex designs, where the model continuously listens to the user while generating responses. However, existing duplex backbones still lack a native channel for in-conversation planning and tool calling, leaving real-time agentic behaviour either tied to turn boundaries or relegated to an external cascade. We propose DuplexSLA, a native full-duplex Speech-Language-Action foundation model that decodes assistant audio together with a structured action stream on a shared 160 ms chunk timeline. DuplexSLA is built on a dual-stream three-channel formulation: a continuous user audio channel, a discrete assistant audio channel, and a rate-limited textual action channel, all decoded jointly by a single backbone, so that listening, speaking, planning, and tool calling unfold on one shared clock. Two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
