Efficient Segmental Cascades for Speech Recognition
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

TL;DR
This paper presents methods to improve the efficiency of discriminative segmental models in speech recognition, enabling faster decoding and training without sacrificing performance.
Contribution
It introduces techniques like feature set reduction, frame subsampling, and pruning to make segmental cascades practical and efficient.
Findings
Maintains competitive phonetic recognition performance
Significantly reduces decoding and training time
Effective combination of efficiency techniques
Abstract
Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition. However, their appeal has been limited by their computational requirements, due to the large number of possible segments to consider. Multi-pass cascades of segmental models introduce features of increasing complexity in different passes, where in each pass a segmental model rescores lattices produced by a previous (simpler) segmental model. In this paper, we explore several ways of making segmental cascades efficient and practical: reducing the feature set in the first pass, frame subsampling, and various pruning approaches. In experiments on phonetic recognition, we find that with a combination of such techniques, it is possible to maintain competitive performance while greatly reducing decoding, pruning, and training time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
