Meta-learning for robust child-adult classification from speech
Nithin Rao Koluguri, Manoj Kumar, So Hyun Kim, Catherine Lord, and Shrikanth Narayanan

TL;DR
This paper introduces a meta-learning approach using prototypical networks to improve child-adult speaker classification in clinical conversations, demonstrating significant performance gains over existing methods.
Contribution
It applies meta-learning with prototypical networks to enhance speaker classification robustness in child-adult interactions, a novel approach in this domain.
Findings
Up to 14.53% relative improvement in F1-scores for weakly supervised classification.
Up to 9.66% relative improvement in cluster purity.
Prototypical networks outperform traditional speaker embeddings in this task.
Abstract
Computational modeling of naturalistic conversations in clinical applications has seen growing interest in the past decade. An important use-case involves child-adult interactions within the autism diagnosis and intervention domain. In this paper, we address a specific sub-problem of speaker diarization, namely child-adult speaker classification in such dyadic conversations with specified roles. Training a speaker classification system robust to speaker and channel conditions is challenging due to inherent variability in the speech within children and the adult interlocutors. In this work, we propose the use of meta-learning, in particular, prototypical networks which optimize a metric space across multiple tasks. By modeling every child-adult pair in the training set as a separate task during meta-training, we learn a representation with improved generalizability compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
