Speaker Recognition from Raw Waveform with SincNet

Mirco Ravanelli; Yoshua Bengio

arXiv:1808.00158·eess.AS·August 12, 2019

Speaker Recognition from Raw Waveform with SincNet

Mirco Ravanelli, Yoshua Bengio

PDF

5 Repos 1 Models 1 Datasets

TL;DR

This paper introduces SincNet, a novel CNN architecture for speaker recognition from raw waveforms that learns meaningful band-pass filters efficiently, outperforming standard CNNs in speed and accuracy.

Contribution

SincNet's innovative use of parametrized sinc functions for filter design enhances feature learning and improves speaker recognition performance.

Findings

01

SincNet converges faster than standard CNNs.

02

SincNet achieves higher accuracy in speaker identification and verification.

03

The filter bank is specifically tuned for speaker recognition tasks.

Abstract

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly. Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants. Proper design of the neural network is crucial to achieve this goal. This paper proposes a novel CNN architecture, called SincNet, that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters. In contrast to standard CNNs, that learn all elements of each filter, only low and high cutoff frequencies are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
D4ve-R/sincnet
model· 5 dl
5 dl

Datasets

confit/librispeech-sid-parquet
dataset· 31 dl
31 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.