GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and   Offline Speech Recognition

Hugo Braun; Justin Luitjens; Ryan Leary; Tim Kaldewey; Daniel Povey

arXiv:1910.10032·cs.CL·February 17, 2020

GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

Hugo Braun, Justin Luitjens, Ryan Leary, Tim Kaldewey, Daniel Povey

PDF

1 Repo

TL;DR

This paper introduces a GPU-accelerated Viterbi lattice decoder that significantly improves speed and memory efficiency for speech recognition, enabling real-time streaming and large graph processing on diverse hardware.

Contribution

The paper presents a novel GPU-based Viterbi decoder with optimized memory, I/O, and parallelism, outperforming existing decoders in speed and scalability for speech recognition tasks.

Findings

01

Up to 240x speedup over single-core CPU decoding

02

Up to 40x faster than current state-of-the-art GPU decoders

03

Supports larger graphs and multiple streams efficiently

Abstract

We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). The decoder is efficient in memory utilization, input/output (I/O) bandwidth, and uses a novel Viterbi implementation designed to maximize parallelism. The reduced memory footprint allows the decoder to process significantly larger graphs than previously possible, while optimizing I/O increases the number of simultaneous streams supported. GPU preprocessing of lattice segments enables intermediate lattice results to be returned to the requestor during streaming inference. Collectively, the proposed algorithm yields up to a 240x speedup over single core CPU decoding, and up to 40x faster decoding than the current state-of-the-art GPU decoder, while returning equivalent results. This decoder design enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvidia-riva/riva-asrlib-decoder
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.