Learning Domain Invariant Representations for Child-Adult Classification from Speech
Rimita Lahiri, Manoj Kumar, Somer Bishop, Shrikanth Narayanan

TL;DR
This paper develops a domain adversarial learning approach to improve child-adult speaker classification in speech analysis for autism diagnostics, addressing variability due to age and data source, and demonstrates significant performance gains.
Contribution
It introduces domain adversarial training methods to create speaker embeddings invariant to age and data source, enhancing classification robustness without labeled target data.
Findings
Up to 13.45% relative improvement over conventional methods.
Effective handling of variability due to age and data source.
Demonstrated on large autism diagnostic dataset.
Abstract
Diagnostic procedures for ASD (autism spectrum disorder) involve semi-naturalistic interactions between the child and a clinician. Computational methods to analyze these sessions require an end-to-end speech and language processing pipeline that go from raw audio to clinically-meaningful behavioral features. An important component of this pipeline is the ability to automatically detect who is speaking when i.e., perform child-adult speaker classification. This binary classification task is often confounded due to variability associated with the participants' speech and background conditions. Further, scarcity of training data often restricts direct application of conventional deep learning methods. In this work, we address two major sources of variability - age of the child and data source collection location - using domain adversarial learning which does not require labeled target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
