Binary Speaker Embedding
Lantian Li, Dong Wang, Chao Xing, Kaimin Yu, Thomas Fang, Zheng

TL;DR
This paper explores binary speaker embedding by transforming i-vectors into binary codes using learned hash functions, achieving comparable or better speaker verification and identification results with reduced memory and computation costs.
Contribution
It introduces a novel Hamming distance learning approach for binary embedding that improves upon simple LSH by learning hash functions tailored to speaker data.
Findings
Binary embedding achieves competitive or superior accuracy.
Memory and computational efficiency are significantly improved.
Learned hash functions outperform random LSH in speaker tasks.
Abstract
The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding. In this paper, we investigate binary speaker embedding, which transforms i-vectors to binary vectors (codes) by a hash function. We start from locality sensitive hashing (LSH), a simple binarization approach where binary codes are derived from a set of random hash functions. A potential problem of LSH is that the randomly sampled hash functions might be suboptimal. We therefore propose an improved Hamming distance learning approach, where the hash function is learned by a variable-sized block training that projects each dimension of the original i-vectors to variable-sized binary codes independently. Our experiments show that binary speaker embedding can deliver competitive or even better results on both speaker verification and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning
