Temporal Feedback Convolutional Recurrent Neural Networks for Speech   Command Recognition

Taejun Kim; Juhan Nam

arXiv:1911.01803·eess.AS·September 20, 2022

Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition

Taejun Kim, Juhan Nam

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural network architecture for speech command recognition that incorporates temporal feedback control inspired by human auditory mechanisms, leading to improved performance over existing CNN-based models.

Contribution

It extends SENets with a recurrent module for temporal feedback, enhancing feature modulation in speech recognition tasks.

Findings

01

The proposed model slightly outperforms SENets and other CNN models.

02

Temporal feedback improves feature scaling and recognition accuracy.

03

Failure analysis reveals insights into the model's performance improvements.

Abstract

End-to-end learning models using raw waveforms as input have shown superior performances in many audio recognition tasks. However, most model architectures are based on convolutional neural networks (CNN) which were mainly developed for visual recognition tasks. In this paper, we propose an extension of squeeze-and-excitation networks (SENets) which adds temporal feedback control from the top-layer features to channel-wise feature activations in lower layers using a recurrent module. This is analogous to the adaptive gain control mechanism of outer hair-cell in the human auditory system. We apply the proposed model to speech command recognition and show that it slightly outperforms the SENets and other CNN-based models. We also investigate the details of the performance improvement by conducting failure analysis and visualizing the channel-wise feature scaling induced by the temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tae-jun/temporal-feedback-crnn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing