Kernel Approximation Methods for Speech Recognition

Avner May; Alireza Bagheri Garakani; Zhiyun Lu; Dong Guo; Kuan Liu,; Aur\'elien Bellet; Linxi Fan; Michael Collins; Daniel Hsu; Brian Kingsbury,; Michael Picheny; Fei Sha

arXiv:1701.03577·stat.ML·January 25, 2019·43 cites

Kernel Approximation Methods for Speech Recognition

Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu,, Aur\'elien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury,, Michael Picheny, Fei Sha

PDF

Open Access

TL;DR

This paper explores large-scale kernel methods for speech recognition, introducing novel techniques for feature selection and training monitoring, achieving performance comparable to deep neural networks on multiple datasets.

Contribution

It proposes new methods for feature selection and training monitoring that significantly enhance kernel acoustic models, making them competitive with DNNs.

Findings

01

Kernel models achieved performance comparable to DNNs.

02

Feature selection reduced the number of random features needed.

03

Monitoring frame-level metrics improved recognition accuracy.

Abstract

We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing