Mamba for Streaming ASR Combined with Unimodal Aggregation

Ying Fang; Xiaofei Li

arXiv:2410.00070·eess.AS·December 30, 2024

Mamba for Streaming ASR Combined with Unimodal Aggregation

Ying Fang, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces Mamba, a state space model for streaming ASR, enhanced with a lookahead mechanism and unimodal aggregation to improve accuracy and reduce latency in real-time speech recognition.

Contribution

It proposes a novel Mamba encoder with lookahead, a streaming unimodal aggregation method, and an early termination technique for efficient streaming ASR.

Findings

01

Achieves competitive accuracy on Mandarin datasets.

02

Reduces recognition latency through early termination.

03

Demonstrates efficiency of Mamba in streaming ASR tasks.

Abstract

This paper works on streaming automatic speech recognition (ASR). Mamba, a recently proposed state space model, has demonstrated the ability to match or surpass Transformers in various tasks while benefiting from a linear complexity advantage. We explore the efficiency of Mamba encoder for streaming ASR and propose an associated lookahead mechanism for leveraging controllable future information. Additionally, a streaming-style unimodal aggregation (UMA) method is implemented, which automatically detects token activity and streamingly triggers token output, and meanwhile aggregates feature frames for better learning token representation. Based on UMA, an early termination (ET) method is proposed to further reduce recognition latency. Experiments conducted on two Mandarin Chinese datasets demonstrate that the proposed model achieves competitive ASR performance in terms of both recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Audio-WestlakeU/UMA-ASR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT-based Smart Home Systems

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces