NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding

Vladimir Bataev; Andrei Andrusenko; Lilit Grigoryan; Aleksandr Laptev; Vitaly Lavrukhin; Boris Ginsburg

arXiv:2505.22857·eess.AS·May 30, 2025

NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding

Vladimir Bataev, Andrei Andrusenko, Lilit Grigoryan, Aleksandr Laptev, Vitaly Lavrukhin, Boris Ginsburg

PDF

Open Access

TL;DR

NGPU-LM is a GPU-optimized n-gram language model that significantly improves context-biasing efficiency in greedy ASR decoding, reducing accuracy gaps with minimal computational overhead.

Contribution

This work introduces NGPU-LM, a novel GPU-accelerated data structure for n-gram models enabling fast, parallel inference across various ASR architectures with minimal overhead.

Findings

01

Reduces over 50% of the accuracy gap between greedy and beam search in out-of-domain scenarios.

02

Achieves less than 7% additional computational overhead for customizable greedy decoding.

03

Provides an open-source implementation for industrial and research use.

Abstract

Statistical n-gram language models are widely used for context-biasing tasks in Automatic Speech Recognition (ASR). However, existing implementations lack computational efficiency due to poor parallelization, making context-biasing less appealing for industrial use. This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference. Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types - including transducers, attention encoder-decoder models, and CTC - with less than 7% computational overhead. The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search. The implementation of the proposed NGPU-LM is open-sourced.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsSoftmax · Attention Is All You Need