Speaker- and Age-Invariant Training for Child Acoustic Modeling Using   Adversarial Multi-Task Learning

Mostafa Shahin; Beena Ahmed; and Julien Epps

arXiv:2210.10231·cs.SD·November 8, 2022·1 cites

Speaker- and Age-Invariant Training for Child Acoustic Modeling Using Adversarial Multi-Task Learning

Mostafa Shahin, Beena Ahmed, and Julien Epps

PDF

Open Access

TL;DR

This paper introduces an adversarial multi-task learning approach to develop child speech acoustic models that are invariant to speaker and age variations, improving speech recognition accuracy.

Contribution

It proposes a novel adversarial multi-task training method with shared and discriminative networks to handle high variability in child speech recognition.

Findings

01

Achieved 13% reduction in WER on OGI speech corpus

02

Demonstrated effectiveness of adversarial multi-task learning for speaker and age invariance

03

Improved robustness of child speech recognition systems

Abstract

One of the major challenges in acoustic modelling of child speech is the rapid changes that occur in the children's articulators as they grow up, their differing growth rates and the subsequent high variability in the same age group. These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children. In this paper, a speaker- and age-invariant training approach based on adversarial multi-task learning is proposed. The system consists of one generator shared network that learns to generate speaker- and age-invariant features connected to three discrimination networks, for phoneme, age, and speaker. The generator network is trained to minimize the phoneme-discrimination loss and maximize the speaker- and age-discrimination losses in an adversarial multi-task learning fashion. The generator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing