Analyzing Speaker Information in Self-Supervised Models to Improve   Zero-Resource Speech Processing

Benjamin van Niekerk; Leanne Nortje; Matthew Baas; Herman Kamper

arXiv:2108.00917·eess.AS·August 3, 2021

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper

PDF

1 Repo

TL;DR

This paper investigates how contrastive predictive coding (CPC) features encode speaker information and proposes normalization techniques to improve zero-resource speech processing, achieving state-of-the-art results in the ZeroSpeech2021 Challenge.

Contribution

It reveals that CPC features contain speaker information and introduces a normalization method to enhance acoustic unit discovery for zero-resource speech tasks.

Findings

01

Per-utterance mean of CPC features captures speaker info

02

Standardizing features removes speaker information

03

Normalization improves zero-resource speech task performance

Abstract

Contrastive predictive coding (CPC) aims to learn representations of speech by distinguishing future observations from a set of negative examples. Previous work has shown that linear classifiers trained on CPC features can accurately predict speaker and phone labels. However, it is unclear how the features actually capture speaker and phonetic information, and whether it is possible to normalize out the irrelevant details (depending on the downstream task). In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent. Concretely, we find that comparing means performs well on a speaker verification task. Next, probing experiments show that standardizing the features effectively removes speaker information. Based on this observation, we propose a speaker normalization step to improve acoustic unit discovery using K-means…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bshall/cpc
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering