Improving fairness for spoken language understanding in atypical speech   with Text-to-Speech

Helin Wang; Venkatesh Ravichandran; Milind Rao; Becky Lammers; Myra; Sydnor; Nicholas Maragakis; Ankur A. Butala; Jayne Zhang; Lora Clawson,; Victoria Chovaz; Laureano Moro-Velazquez

arXiv:2311.10149·eess.AS·November 20, 2023·1 cites

Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Helin Wang, Venkatesh Ravichandran, Milind Rao, Becky Lammers, Myra, Sydnor, Nicholas Maragakis, Ankur A. Butala, Jayne Zhang, Lora Clawson,, Victoria Chovaz, Laureano Moro-Velazquez

PDF

Open Access 1 Repo

TL;DR

This paper introduces Aty-TTS, a novel TTS-based data augmentation method that improves fairness and performance of spoken language understanding systems for atypical speech by capturing unique vocal characteristics.

Contribution

The paper presents Aty-TTS, a new TTS finetuning approach that models atypical speech features for data augmentation, enhancing SLU fairness for neurological and motor impairment speakers.

Findings

01

Aty-TTS generates high-quality atypical speech data.

02

Augmented data improves SLU accuracy for atypical speech.

03

The method enhances fairness in SLU systems.

Abstract

Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments. Recent advancements in Text-to-Speech (TTS) synthesis-based augmentation for more fair SLU have struggled to accurately capture the unique vocal characteristics of atypical speakers, largely due to insufficient data. To address this issue, we present a novel data augmentation method for atypical speakers by finetuning a TTS model, called Aty-TTS. Aty-TTS models speaker and atypical characteristics via knowledge transferring from a voice conversion model. Then, we use the augmented data to train SLU models adapted to atypical speech. To train these data augmentation models and evaluate the resulting SLU systems, we have collected a new atypical speech dataset containing intent annotation. Both objective and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wanghelin1997/aty-tts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems