Deep Learning Approaches for Understanding Simple Speech Commands

Roman A. Solovyev; Maxim Vakhrushev; Alexander Radionov; Vladimir; Aliev; Alexey A. Shvets

arXiv:1810.02364·cs.SD·October 8, 2018

Deep Learning Approaches for Understanding Simple Speech Commands

Roman A. Solovyev, Maxim Vakhrushev, Alexander Radionov, Vladimir, Aliev, Alexey A. Shvets

PDF

TL;DR

This paper explores various sound representations and convolutional neural network architectures for classifying simple speech commands, achieving high accuracy in a Kaggle challenge.

Contribution

It introduces effective sound representations and CNN models for speech command classification, demonstrating competitive performance in a major challenge.

Findings

01

Identified optimal sound representations for CNN classification

02

Achieved top-10 placement in Kaggle Speech Recognition Challenge

03

Compared 1D and 2D CNN approaches for sound classification

Abstract

Automatic classification of sound commands is becoming increasingly important, especially for mobile and embedded devices. Many of these devices contain both cameras and microphones, and companies that develop them would like to use the same technology for both of these classification tasks. One way of achieving this is to represent sound commands as images, and use convolutional neural networks when classifying images as well as sounds. In this paper we consider several approaches to the problem of sound classification that we applied in TensorFlow Speech Recognition Challenge organized by Google Brain team on the Kaggle platform. Here we show different representation of sounds (Wave frames, Spectrograms, Mel-Spectrograms, MFCCs) and apply several 1D and 2D convolutional neural networks in order to get the best performance. Our experiments show that we found appropriate sound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.