Data Efficient Child-Adult Speaker Diarization with Simulated Conversations
Anfeng Xu, Tiantian Feng, Helen Tager-Flusberg, Catherine Lord, Shrikanth Narayanan

TL;DR
This paper introduces a data-efficient child-adult speaker diarization method using simulated conversations and minimal real data, achieving strong zero-shot performance and significant improvements with limited fine-tuning.
Contribution
The authors develop a novel approach that leverages simulated data for training, reducing the need for extensive annotated datasets in child-adult speaker diarization.
Findings
Strong zero-shot performance on real datasets
Performance improves with only 30 minutes of real data
LoRA enhances transfer learning effectiveness
Abstract
Automating child speech analysis is crucial for applications such as neurocognitive assessments. Speaker diarization, which identifies ``who spoke when'', is an essential component of the automated analysis. However, publicly available child-adult speaker diarization solutions are scarce due to privacy concerns and a lack of annotated datasets, while manually annotating data for each scenario is both time-consuming and costly. To overcome these challenges, we propose a data-efficient solution by creating simulated child-adult conversations using AudioSet. We then train a Whisper Encoder-based model, achieving strong zero-shot performance on child-adult speaker diarization using real datasets. The model performance improves substantially when fine-tuned with only 30 minutes of real train data, with LoRA further improving the transfer learning performance. The source code and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
