Parallel Composition of Weighted Finite-State Transducers
Shubho Sengupta, Vineel Pratap, Awni Hannun

TL;DR
This paper introduces a GPU-based parallel algorithm for composing weighted finite-state transducers, significantly improving efficiency and scalability over traditional CPU methods, especially for large graphs in speech recognition.
Contribution
A novel parallel composition algorithm for FSTs implemented on GPUs, achieving 10-30x speedup over sequential CPU algorithms.
Findings
Parallel algorithm scales better with input size.
Achieves 10-30 times speedup on large graphs.
Effective for speech recognition applications.
Abstract
Finite-state transducers (FSTs) are frequently used in speech recognition. Transducer composition is an essential operation for combining different sources of information at different granularities. However, composition is also one of the more computationally expensive operations. Due to the heterogeneous structure of FSTs, parallel algorithms for composition are suboptimal in efficiency, generality, or both. We propose an algorithm for parallel composition and implement it on graphics processing units. We benchmark our parallel algorithm on the composition of random graphs and the composition of graphs commonly used in speech recognition. The parallel composition scales better with the size of the input graphs and for large graphs can be as much as 10 to 30 times faster than a sequential CPU algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Network Packet Processing and Optimization
