Binary Speaker Embedding

Lantian Li; Dong Wang; Chao Xing; Kaimin Yu; Thomas Fang; Zheng

arXiv:1510.05937·cs.SD·April 1, 2016

Binary Speaker Embedding

Lantian Li, Dong Wang, Chao Xing, Kaimin Yu, Thomas Fang, Zheng

PDF

Open Access

TL;DR

This paper explores binary speaker embedding by transforming i-vectors into binary codes using learned hash functions, achieving comparable or better speaker verification and identification results with reduced memory and computation costs.

Contribution

It introduces a novel Hamming distance learning approach for binary embedding that improves upon simple LSH by learning hash functions tailored to speaker data.

Findings

01

Binary embedding achieves competitive or superior accuracy.

02

Memory and computational efficiency are significantly improved.

03

Learned hash functions outperform random LSH in speaker tasks.

Abstract

The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding. In this paper, we investigate binary speaker embedding, which transforms i-vectors to binary vectors (codes) by a hash function. We start from locality sensitive hashing (LSH), a simple binarization approach where binary codes are derived from a set of random hash functions. A potential problem of LSH is that the randomly sampled hash functions might be suboptimal. We therefore propose an improved Hamming distance learning approach, where the hash function is learned by a variable-sized block training that projects each dimension of the original i-vectors to variable-sized binary codes independently. Our experiments show that binary speaker embedding can deliver competitive or even better results on both speaker verification and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning