Efficient Segmental Cascades for Speech Recognition

Hao Tang; Weiran Wang; Kevin Gimpel; Karen Livescu

arXiv:1608.00929·cs.CL·August 3, 2016

Efficient Segmental Cascades for Speech Recognition

Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

PDF

Open Access

TL;DR

This paper presents methods to improve the efficiency of discriminative segmental models in speech recognition, enabling faster decoding and training without sacrificing performance.

Contribution

It introduces techniques like feature set reduction, frame subsampling, and pruning to make segmental cascades practical and efficient.

Findings

01

Maintains competitive phonetic recognition performance

02

Significantly reduces decoding and training time

03

Effective combination of efficiency techniques

Abstract

Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition. However, their appeal has been limited by their computational requirements, due to the large number of possible segments to consider. Multi-pass cascades of segmental models introduce features of increasing complexity in different passes, where in each pass a segmental model rescores lattices produced by a previous (simpler) segmental model. In this paper, we explore several ways of making segmental cascades efficient and practical: reducing the feature set in the first pass, frame subsampling, and various pruning approaches. In experiments on phonetic recognition, we find that with a combination of such techniques, it is possible to maintain competitive performance while greatly reducing decoding, pruning, and training time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing