Probing the Information Encoded in X-vectors

Desh Raj; David Snyder; Daniel Povey; Sanjeev Khudanpur

arXiv:1909.06351·eess.AS·June 16, 2020

Probing the Information Encoded in X-vectors

Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

PDF

TL;DR

This paper investigates what information x-vector speaker embeddings encode, including speaker identity, channel, and spoken content, and compares their information content with i-vectors, revealing their strengths and limitations.

Contribution

It introduces a probing methodology to analyze the information encoded in x-vectors and examines the impact of data augmentation on their content.

Findings

01

X-vectors encode speaker, channel, and spoken content information.

02

X-vectors perform well in speaker verification tasks.

03

Data augmentation influences the information captured by x-vectors.

Abstract

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by x-vector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.