Exploring Speech Foundation Models for Speaker Diarization in   Child-Adult Dyadic Interactions

Anfeng Xu; Kevin Huang; Tiantian Feng; Lue Shen; Helen Tager-Flusberg,; Shrikanth Narayanan

arXiv:2406.07890·eess.AS·June 13, 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg,, Shrikanth Narayanan

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of speech foundation models for speaker diarization in child-adult interactions, demonstrating significant error reduction and benchmarking various factors affecting performance.

Contribution

It is the first to evaluate speech foundation models specifically for child-adult speaker diarization, showing their potential to improve low-resource child speech understanding.

Findings

01

39.5% relative reduction in Diarization Error Rate

02

62.3% relative reduction in Speaker Confusion Rate

03

Benchmarking reveals impact of input window size and demographics

Abstract

Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker Confusion Rate, respectively, compared to previous speaker diarization methods. In addition, we benchmark and evaluate the speaker diarization results of the speech foundation models with varying the input audio window size, speaker demographics, and training data ratio. Our results highlight promising pathways for understanding and adopting speech foundation models to facilitate child speech understanding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usc-sail/child-adult-diarization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis