Phone Duration Modeling for Speaker Age Estimation in Children
Prashanth Gurunath Shivakumar, Somer Bishop, Catherine Lord, Shrikanth, Narayanan

TL;DR
This paper introduces a novel approach for estimating children's speaker age using phone duration features derived from forced alignment, demonstrating robustness across datasets and developmental stages.
Contribution
It proposes a new feature set based on phone durations for children's age estimation, addressing challenges of variability and data scarcity in child speech.
Findings
Phone duration features effectively predict children's age.
The approach is robust across different datasets and age groups.
Certain phonemes are more indicative of age than others.
Abstract
Automatic inference of important paralinguistic information such as age from speech is an important area of research with numerous spoken language technology based applications. Speaker age estimation has applications in enabling personalization and age-appropriate curation of information and content. However, research in speaker age estimation in children is especially challenging due to paucity of relevant speech data representing the developmental spectrum, and the high signal variability especially intra age variability that complicates modeling. Most approaches in children speaker age estimation adopt methods directly from research on adult speech processing. In this paper, we propose features specific to children and focus on speaker's phone duration as an important biomarker of children's age. We propose phone duration modeling for predicting age from child's speech. To enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
