Data Efficient Child-Adult Speaker Diarization with Simulated Conversations

Anfeng Xu; Tiantian Feng; Helen Tager-Flusberg; Catherine Lord; Shrikanth Narayanan

arXiv:2409.08881·eess.AS·June 13, 2025·ICASSP

Data Efficient Child-Adult Speaker Diarization with Simulated Conversations

Anfeng Xu, Tiantian Feng, Helen Tager-Flusberg, Catherine Lord, Shrikanth Narayanan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a data-efficient child-adult speaker diarization method using simulated conversations and minimal real data, achieving strong zero-shot performance and significant improvements with limited fine-tuning.

Contribution

The authors develop a novel approach that leverages simulated data for training, reducing the need for extensive annotated datasets in child-adult speaker diarization.

Findings

01

Strong zero-shot performance on real datasets

02

Performance improves with only 30 minutes of real data

03

LoRA enhances transfer learning effectiveness

Abstract

Automating child speech analysis is crucial for applications such as neurocognitive assessments. Speaker diarization, which identifies ``who spoke when'', is an essential component of the automated analysis. However, publicly available child-adult speaker diarization solutions are scarce due to privacy concerns and a lack of annotated datasets, while manually annotating data for each scenario is both time-consuming and costly. To overcome these challenges, we propose a data-efficient solution by creating simulated child-adult conversations using AudioSet. We then train a Whisper Encoder-based model, achieving strong zero-shot performance on child-adult speaker diarization using real datasets. The model performance improves substantially when fine-tuned with only 30 minutes of real train data, with LoRA further improving the transfer learning performance. The source code and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usc-sail/child-adult-diarization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems