Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging   Features For Elderly And Dysarthric Speech Recognition

Shujie Hu; Xurong Xie; Mengzhe Geng; Mingyu Cui; Jiajun Deng; Guinan; Li; Tianzi Wang; Xunying Liu; Helen Meng

arXiv:2206.07327·eess.AS·June 23, 2023

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan, Li, Tianzi Wang, Xunying Liu, Helen Meng

PDF

Open Access

TL;DR

This paper introduces a cross-domain and cross-lingual approach to generate ultrasound tongue imaging features for speech recognition in elderly and dysarthric speakers, improving accuracy over acoustic-only systems.

Contribution

It presents a novel A2A inversion method that leverages parallel audio and ultrasound data for cross-lingual and cross-domain speech recognition, especially for atypical speech.

Findings

01

Significant word error rate reductions up to 4.75%.

02

Consistent improvement over baseline acoustic systems.

03

Effective cross-lingual and cross-domain adaptation.

Abstract

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech. Their practical application to atypical task domains such as elderly and disordered speech across languages is often limited by the difficulty in collecting such specialist data from target speakers. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training before being cross-domain and cross-lingual adapted to three datasets across two languages: the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech corpora; and the English TORGO dysarthric speech data, to produce UTI based articulatory features. Experiments conducted on three tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders