Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command Recognition
Jun Qi, Javier Tejedor

TL;DR
This paper presents a hybrid tensor-train neural network architecture for spoken command recognition that significantly reduces model size while maintaining high accuracy, demonstrating effectiveness on the Google Speech Command Dataset.
Contribution
It introduces a CNN+(TT-DNN) model that replaces fully connected layers with tensor-train layers, achieving parameter reduction without sacrificing performance.
Findings
Achieves 96.31% accuracy with 4x fewer parameters than CNN baseline.
Can reach 97.2% accuracy with increased parameters.
Demonstrates effective trade-off between model complexity and accuracy.
Abstract
This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. More specifically, we exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our command recognition system, namely CNN+(TT-DNN), is composed of convolutional layers at the bottom for spectral feature extraction and TT layers at the top for command classification. Compared with a traditional end-to-end CNN baseline for SCR, our proposed CNN+(TT-DNN) model replaces fully connected (FC) layers with TT ones and it can substantially reduce the number of model parameters while maintaining the baseline performance of the CNN model. We initialize the CNN+(TT-DNN) model in a randomized manner or based on a well-trained CNN+DNN, and assess the CNN+(TT-DNN) models on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Tensor decomposition and applications · Topic Modeling
