M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Shuai Wang, Pengcheng Zhu, Haizhou Li

TL;DR
This paper introduces M-Vec, a hierarchical speaker embedding method that enables dynamic extraction of low-dimensional sub-embeddings without sacrificing performance, reducing storage and computation costs.
Contribution
The paper presents a novel Matryoshka speaker embedding technique that allows flexible, hierarchical extraction of sub-embeddings while maintaining high speaker verification accuracy.
Findings
Achieves high speaker verification performance with embeddings as low as 8 dimensions.
Demonstrates the effectiveness of hierarchical embeddings on the VoxCeleb dataset.
Reduces storage and computational costs in large-scale speaker databases.
Abstract
Fixed-dimensional speaker embeddings have become the dominant approach in speaker modeling, typically spanning hundreds to thousands of dimensions. These dimensions are hyperparameters that are not specifically picked, nor are they hierarchically ordered in terms of importance. In large-scale speaker representation databases, reducing the dimensionality of embeddings can significantly lower storage and computational costs. However, directly training low-dimensional representations often yields suboptimal performance. In this paper, we introduce the Matryoshka speaker embedding, a method that allows dynamic extraction of sub-dimensions from the embedding while maintaining performance. Our approach is validated on the VoxCeleb dataset, demonstrating that it can achieve extremely low-dimensional embeddings, such as 8 dimensions, while preserving high speaker verification performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
