DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and   Streaming Capabilities

Xiangyu Lu; Wang Xu; Haoyu Wang; Hongyun Zhou; Haiyan Zhao; Conghui; Zhu; Tiejun Zhao; Muyun Yang

arXiv:2502.11123·cs.CL·April 4, 2025

DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities

Xiangyu Lu, Wang Xu, Haoyu Wang, Hongyun Zhou, Haiyan Zhao, Conghui, Zhu, Tiejun Zhao, Muyun Yang

PDF

Open Access 1 Repo

TL;DR

DuplexMamba is a novel multimodal duplex model that enables real-time, streaming speech-to-text conversations with simultaneous input processing and output generation, improving efficiency for human-machine interactions.

Contribution

It introduces a Mamba-based end-to-end model with a duplex decoding strategy for real-time speech conversation, supporting streaming and outperforming traditional Transformer models.

Findings

01

Achieves real-time duplex speech processing with performance comparable to Transformer models.

02

Supports dynamic streaming input and output for natural conversations.

03

Demonstrates effectiveness on speech recognition and voice assistant benchmarks.

Abstract

Real-time speech conversation is essential for natural and efficient human-machine interactions, requiring duplex and streaming capabilities. Traditional Transformer-based conversational chatbots operate in a turn-based manner and exhibit quadratic computational complexity that grows as the input size increases. In this paper, we propose DuplexMamba, a Mamba-based end-to-end multimodal duplex model for speech-to-text conversation. DuplexMamba enables simultaneous input processing and output generation, dynamically adjusting to support real-time streaming. Specifically, we develop a Mamba-based speech encoder and adapt it with a Mamba-based language model. Furthermore, we introduce a novel duplex decoding strategy that enables DuplexMamba to process input and generate output simultaneously. Experimental results demonstrate that DuplexMamba successfully implements duplex and streaming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khfs/DuplexMamba
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech and dialogue systems · Speech Recognition and Synthesis