HeightCeleb - an enrichment of VoxCeleb dataset with speaker height information
Stanis{\l}aw Kacprzak, Konrad Kowalczyk

TL;DR
This paper introduces HeightCeleb, a new dataset extending VoxCeleb with speaker height information, enabling improved speaker height estimation using existing speaker recognition models and simple regression techniques.
Contribution
The creation of HeightCeleb dataset with automated height annotations for VoxCeleb speakers, facilitating research in speaker height estimation without additional model training.
Findings
Achieved state-of-the-art height estimation results on TIMIT using HeightCeleb data.
Demonstrated that pre-trained speaker embeddings can be effectively used for height prediction.
Showed that simple regression methods suffice for accurate height estimation.
Abstract
Prediction of speaker's height is of interest for voice forensics, surveillance, and automatic speaker profiling. Until now, TIMIT has been the most popular dataset for training and evaluation of the height estimation methods. In this paper, we introduce HeightCeleb, an extension to VoxCeleb, which is the dataset commonly used in speaker recognition tasks. This enrichment consists in adding information about the height of all 1251 speakers from VoxCeleb that has been extracted with an automated method from publicly available sources. Such annotated data will enable the research community to utilize freely available speaker embedding extractors, pre-trained on VoxCeleb, to build more efficient speaker height estimators. In this work, we describe the creation of the HeightCeleb dataset and show that using it enables to achieve state-of-the-art results on the TIMIT test set by using simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
