Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command   Recognition

Jun Qi; Javier Tejedor

arXiv:2201.10609·cs.SD·January 27, 2022

Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command Recognition

Jun Qi, Javier Tejedor

PDF

Open Access

TL;DR

This paper presents a hybrid tensor-train neural network architecture for spoken command recognition that significantly reduces model size while maintaining high accuracy, demonstrating effectiveness on the Google Speech Command Dataset.

Contribution

It introduces a CNN+(TT-DNN) model that replaces fully connected layers with tensor-train layers, achieving parameter reduction without sacrificing performance.

Findings

01

Achieves 96.31% accuracy with 4x fewer parameters than CNN baseline.

02

Can reach 97.2% accuracy with increased parameters.

03

Demonstrates effective trade-off between model complexity and accuracy.

Abstract

This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. More specifically, we exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our command recognition system, namely CNN+(TT-DNN), is composed of convolutional layers at the bottom for spectral feature extraction and TT layers at the top for command classification. Compared with a traditional end-to-end CNN baseline for SCR, our proposed CNN+(TT-DNN) model replaces fully connected (FC) layers with TT ones and it can substantially reduce the number of model parameters while maintaining the baseline performance of the CNN model. We initialize the CNN+(TT-DNN) model in a randomized manner or based on a well-trained CNN+DNN, and assess the CNN+(TT-DNN) models on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Tensor decomposition and applications · Topic Modeling