Kernel Approximation Methods for Speech Recognition
Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu,, Aur\'elien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury,, Michael Picheny, Fei Sha

TL;DR
This paper explores large-scale kernel methods for speech recognition, introducing novel techniques for feature selection and training monitoring, achieving performance comparable to deep neural networks on multiple datasets.
Contribution
It proposes new methods for feature selection and training monitoring that significantly enhance kernel acoustic models, making them competitive with DNNs.
Findings
Kernel models achieved performance comparable to DNNs.
Feature selection reduced the number of random features needed.
Monitoring frame-level metrics improved recognition accuracy.
Abstract
We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
