Speech and Speaker Recognition from Raw Waveform with SincNet

Mirco Ravanelli; Yoshua Bengio

arXiv:1812.05920·eess.AS·February 26, 2019·28 cites

Speech and Speaker Recognition from Raw Waveform with SincNet

Mirco Ravanelli, Yoshua Bengio

PDF

Open Access 2 Repos

TL;DR

This paper introduces SincNet, a CNN architecture that processes raw audio waveforms for speech and speaker recognition, using parametrized sinc functions to learn meaningful filters efficiently and effectively.

Contribution

SincNet is a novel CNN that learns band-pass filter parameters directly from data, improving training speed, accuracy, and efficiency over standard CNNs in speech tasks.

Findings

01

Faster convergence compared to standard CNNs

02

Improved recognition performance

03

More computationally efficient model

Abstract

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw audio samples directly. Differently from standard hand-crafted features such as MFCCs or FBANK, the raw waveform can potentially help neural networks discover better and more customized representations. The high-dimensional raw inputs, however, can make training significantly more challenging. This paper summarizes our recent efforts to develop a neural architecture that efficiently processes speech from audio waveforms. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing