Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Yunkyu Lim; Jihwan Park; Hyung Yong Kim; Hanbin Lee; Byeong-Yeol Kim

arXiv:2508.19671·eess.AS·August 28, 2025

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Yunkyu Lim, Jihwan Park, Hyung Yong Kim, Hanbin Lee, Byeong-Yeol Kim

PDF

TL;DR

This paper introduces Hybrid Decoding, a method combining a fast auxiliary decoder with a traditional transformer decoder to speed up inference and reduce repetition errors in speech recognition, achieving faster and more accurate results.

Contribution

The paper proposes a novel hybrid decoding approach that extends transformer models with a lightweight fast decoder for rapid inference and selective correction, improving speed and robustness.

Findings

01

More than doubled inference speed on speech recognition benchmarks.

02

Achieved comparable or better word error rates than baseline models.

03

Enhanced robustness against repetitive errors in recognition outputs.

Abstract

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference. Additionally, although rare, repetition can occur and negatively affect recognition accuracy. To tackle these challenges, we propose a novel Hybrid Decoding approach that both accelerates inference and alleviates the issue of repetition. Our method extends the transformer encoder-decoder architecture by attaching a lightweight, fast decoder to the pretrained encoder. During inference, the fast decoder rapidly generates an output, which is then verified and, if necessary, selectively corrected by the Transformer decoder. This results in faster decoding and improved robustness against repetitive errors. Experiments on the LibriSpeech and GigaSpeech test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.