Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits

Tiantian Feng; Jihwan Lee; Anfeng Xu; Yoonjeong Lee; Thanathai Lertpetchpun; Xuan Shi; Helin Wang; Thomas Thebaud; Laureano Moro-Velazquez; Dani Byrd; Najim Dehak; Shrikanth Narayanan

arXiv:2505.14648·cs.SD·May 21, 2025

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits

Tiantian Feng, Jihwan Lee, Anfeng Xu, Yoonjeong Lee, Thanathai Lertpetchpun, Xuan Shi, Helin Wang, Thomas Thebaud, Laureano Moro-Velazquez, Dani Byrd, Najim Dehak, Shrikanth Narayanan

PDF

Open Access 1 Repo 10 Models

TL;DR

Vox-Profile is a comprehensive benchmark for characterizing diverse speaker and speech traits using foundation models, enabling multi-dimensional profiling and supporting various speech analysis applications.

Contribution

It introduces a holistic, multi-dimensional speech trait benchmark grounded in speech science, developed with domain experts, and validated across multiple datasets and models.

Findings

01

Vox-Profile effectively characterizes static and dynamic speech traits.

02

It enhances analysis of ASR performance variability.

03

It evaluates speech generation systems with automated profiles.

Abstract

We introduce Vox-Profile, a comprehensive benchmark to characterize rich speaker and speech traits using speech foundation models. Unlike existing works that focus on a single dimension of speaker traits, Vox-Profile provides holistic and multi-dimensional profiles that reflect both static speaker traits (e.g., age, sex, accent) and dynamic speech properties (e.g., emotion, speech flow). This benchmark is grounded in speech science and linguistics, developed with domain experts to accurately index speaker and speech characteristics. We report benchmark experiments using over 15 publicly available speech datasets and several widely used speech foundation models that target various static and dynamic speaker and speech properties. In addition to benchmark experiments, we showcase several downstream applications supported by Vox-Profile. First, we show that Vox-Profile can augment existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiantiaf0627/vox-profile-release
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis