TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR

Qingshun She; Jing Peng; Yangui Fang; Yu Xi; Kai Yu

arXiv:2602.11546·eess.AS·March 3, 2026

TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR

Qingshun She, Jing Peng, Yangui Fang, Yu Xi, Kai Yu

PDF

Open Access

TL;DR

This paper introduces TC-BiMamba, a novel method enabling dynamic chunk size training for unified streaming and non-streaming ASR, improving training efficiency and model performance by capturing bidirectional context.

Contribution

The paper proposes Trans-Chunk BiMamba, a new approach that allows dynamic chunk size training for bidirectional ASR models, reducing training overhead and enhancing performance.

Findings

01

Achieves 1.3x training speedup

02

Reduces training memory by 50%

03

Outperforms U2++ and matches LC-BiMamba with smaller size

Abstract

This work investigates bidirectional Mamba (BiMamba) for unified streaming and non-streaming automatic speech recognition (ASR). Dynamic chunk size training enables a single model for offline decoding and streaming decoding with various latency settings. In contrast, existing BiMamba based streaming method is limited to fixed chunk size decoding. When dynamic chunk size training is applied, training overhead increases substantially. To tackle this issue, we propose the Trans-Chunk BiMamba (TC-BiMamba) for dynamic chunk size training. Trans-Chunk mechanism trains both bidirectional sequences in an offline style with dynamic chunk size. On the one hand, compared to traditional chunk-wise processing, TC-BiMamba simultaneously achieves 1.3 times training speedup, reduces training memory by 50%, and improves model performance since it can capture bidirectional context. On the other hand,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques