FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

Lilit Grigoryan; Vladimir Bataev; Nikolay Karpov; Andrei Andrusenko; Vitaly Lavrukhin; Boris Ginsburg

arXiv:2508.07315·eess.AS·August 14, 2025

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

Lilit Grigoryan, Vladimir Bataev, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Boris Ginsburg

PDF

Open Access

TL;DR

FlexCTC introduces a GPU-accelerated, fully batched CTC beam decoding toolkit in Python that significantly enhances speech recognition performance with advanced contextual features and ease of use.

Contribution

It provides the first fully GPU-based, Python-compatible CTC beam decoder with advanced contextualization, surpassing traditional CPU or C++ implementations in speed and flexibility.

Findings

01

Achieves faster decoding speeds compared to CPU-based decoders.

02

Supports GPU-powered language model fusion and phrase boosting.

03

Offers an open-source, extensible toolkit for research and production.

Abstract

While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit for fully GPU-based beam decoding, designed for Connectionist Temporal Classification (CTC) models. Developed entirely in Python and PyTorch, it offers a fast, user-friendly, and extensible alternative to traditional C++, CUDA, or WFST-based decoders. The toolkit features a high-performance, fully batched GPU implementation with eliminated CPU-GPU synchronization and minimized kernel launch overhead via CUDA Graphs. It also supports advanced contextualization techniques, including GPU-powered N-gram language model fusion and phrase-level boosting. These features enable accurate and efficient decoding, making them suitable for both research and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research