Neural Predictive Coding using Convolutional Neural Networks towards   Unsupervised Learning of Speaker Characteristics

Arindam Jati; Panayiotis Georgiou

arXiv:1802.07860·cs.SD·July 18, 2019

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Arindam Jati, Panayiotis Georgiou

PDF

TL;DR

This paper introduces Neural Predictive Coding (NPC), an unsupervised learning framework using convolutional neural networks to extract speaker-specific features from unlabeled audio data, even with non-speech and multi-speaker content.

Contribution

The paper proposes a novel unsupervised method, NPC, leveraging a short-term active-speaker stationarity hypothesis and siamese networks to learn speaker embeddings from unlabeled data.

Findings

01

NPC embeddings outperform in short-duration speaker identification.

02

NPC provides complementary information to i-vectors in full-utterance scenarios.

03

In large-scale verification, NPC compares favorably with supervised methods.

Abstract

Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain many non-speech events and multi-speaker audio streams. The NPC framework exploits the proposed short-term active-speaker stationarity hypothesis which assumes two temporally-close short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce "speaker embeddings" by learning to separate `same' vs `different' speaker pairs which are generated from an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.