Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Andrei Andrusenko; Vladimir Bataev; Lilit Grigoryan; Nune Tadevosyan; Vitaly Lavrukhin; Boris Ginsburg

arXiv:2604.19079·eess.AS·April 22, 2026

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Andrei Andrusenko, Vladimir Bataev, Lilit Grigoryan, Nune Tadevosyan, Vitaly Lavrukhin, Boris Ginsburg

PDF

1 Repo 1 Models

TL;DR

This paper introduces a unified speech recognition model that performs well in both offline and streaming modes, using a novel regularization technique to improve accuracy and maintain performance across modes.

Contribution

It presents a new unified RNNT-based ASR framework with a mode-consistency regularization method, enabling effective joint offline and streaming speech recognition.

Findings

01

Improved streaming accuracy at low latency.

02

Maintained offline recognition performance.

03

Scalable to larger models and datasets.

Abstract

Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings remains challenging. We present a Unified ASR framework for Transducer (RNNT) training that supports both offline and streaming decoding within a single model, using chunk-limited attention with right context and dynamic chunked convolutions. To further close the gap between offline and streaming performance, we introduce an efficient Triton implementation of mode-consistency regularization for RNNT (MCR-RNNT), which encourages agreement across training modes. Experiments show that the proposed approach improves streaming accuracy at low latency while preserving offline performance and scaling to larger model sizes and training datasets. The proposed Unified ASR framework and the English model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Models

🤗
nvidia/parakeet-unified-en-0.6b
model· 783 dl· ♡ 43
783 dl♡ 43

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.