Composing Finite State Transducers on GPUs
Arturo Argueta, David Chiang

TL;DR
This paper presents the first GPU-based implementation of finite state transducer composition, achieving significant speedups over serial and CPU-based methods, thus enhancing efficiency in language processing tasks.
Contribution
The paper introduces a novel GPU implementation of FST composition and discusses optimizations for high-performance parallel processing.
Findings
Up to 6x speedup over serial implementation
Up to 4.5x speedup over OpenFST
Effective GPU optimizations for FST operations
Abstract
Weighted finite-state transducers (FSTs) are frequently used in language processing to handle tasks such as part-of-speech tagging and speech recognition. There has been previous work using multiple CPU cores to accelerate finite state algorithms, but limited attention has been given to parallel graphics processing unit (GPU) implementations. In this paper, we introduce the first (to our knowledge) GPU implementation of the FST composition operation, and we also discuss the optimizations used to achieve the best performance on this architecture. We show that our approach obtains speedups of up to 6x over our serial implementation and 4.5x over OpenFST.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
