Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation

Jaejun Lee; Kyogu Lee

arXiv:2506.19446·cs.SD·June 25, 2025

Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation

Jaejun Lee, Kyogu Lee

PDF

Open Access 1 Repo

TL;DR

Vo-Ve introduces an explainable voice-vector embedding that captures speaker identity and provides interpretable voice attribute probabilities, improving speaker similarity evaluation and enhancing speech task analysis.

Contribution

The paper presents Vo-Ve, a novel explainable voice-vector embedding that incorporates explicit voice attribute probabilities for better interpretability and evaluation.

Findings

01

Vo-Ve performs competitively in speaker similarity evaluation.

02

Vo-Ve offers high-level interpretability through voice attribute probabilities.

03

The approach enhances speech task evaluation schemes.

Abstract

In this paper, we propose Vo-Ve, a novel voice-vector embedding that captures speaker identity. Unlike conventional speaker embeddings, Vo-Ve is explainable, as it contains the probabilities of explicit voice attribute classes. Through extensive analysis, we demonstrate that Vo-Ve not only evaluates speaker similarity competitively with conventional techniques but also provides an interpretable explanation in terms of voice attributes. We strongly believe that Vo-Ve can enhance evaluation schemes across various speech tasks due to its high-level explainability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaejunl/vove
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis