Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
Jaejun Lee, Kyogu Lee

TL;DR
Vo-Ve introduces an explainable voice-vector embedding that captures speaker identity and provides interpretable voice attribute probabilities, improving speaker similarity evaluation and enhancing speech task analysis.
Contribution
The paper presents Vo-Ve, a novel explainable voice-vector embedding that incorporates explicit voice attribute probabilities for better interpretability and evaluation.
Findings
Vo-Ve performs competitively in speaker similarity evaluation.
Vo-Ve offers high-level interpretability through voice attribute probabilities.
The approach enhances speech task evaluation schemes.
Abstract
In this paper, we propose Vo-Ve, a novel voice-vector embedding that captures speaker identity. Unlike conventional speaker embeddings, Vo-Ve is explainable, as it contains the probabilities of explicit voice attribute classes. Through extensive analysis, we demonstrate that Vo-Ve not only evaluates speaker similarity competitively with conventional techniques but also provides an interpretable explanation in terms of voice attributes. We strongly believe that Vo-Ve can enhance evaluation schemes across various speech tasks due to its high-level explainability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
