Robust Self Supervised Speech Embeddings for Child-Adult Classification   in Interactions involving Children with Autism

Rimita Lahiri; Tiantian Feng; Rajat Hebbar; Catherine Lord; So Hyun; Kim; Shrikanth Narayanan

arXiv:2307.16398·eess.AS·August 1, 2023·Interspeech·1 cites

Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism

Rimita Lahiri, Tiantian Feng, Rajat Hebbar, Catherine Lord, So Hyun, Kim, Shrikanth Narayanan

PDF

Open Access

TL;DR

This paper improves child-adult speaker classification in interactions involving children with Autism by leveraging additional self-supervised pre-training on unlabelled child speech, achieving significant performance gains.

Contribution

It introduces a novel pre-training approach using self-supervised algorithms with unlabelled child speech, enhancing classification accuracy in neurodiverse child-inclusive interactions.

Findings

01

Achieved 9-13% relative improvement in F1 scores over baseline.

02

Demonstrated robustness across different demographic subgroups.

03

Validated effectiveness on clinical datasets involving children with Autism.

Abstract

We address the problem of detecting who spoke when in child-inclusive spoken interactions i.e., automatic child-adult speaker classification. Interactions involving children are richly heterogeneous due to developmental differences. The presence of neurodiversity e.g., due to Autism, contributes additional variability. We investigate the impact of additional pre-training with more unlabelled child speech on the child-adult classification performance. We pre-train our model with child-inclusive interactions, following two recent self-supervision algorithms, Wav2vec 2.0 and WavLM, with a contrastive loss objective. We report 9 - 13% relative improvement over the state-of-the-art baseline with regards to classification F1 scores on two clinical interaction datasets involving children with Autism. We also analyze the impact of pre-training under different conditions by evaluating our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Language Development and Disorders · Speech Recognition and Synthesis