Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For   Disordered Speech Recognition

Shujie Hu; Shansong Liu; Xurong Xie; Mengzhe Geng; Tianzi Wang,; Shoukang Hu; Mingyu Cui; Xunying Liu; Helen Meng

arXiv:2203.10274·eess.AS·March 22, 2022

Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang,, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng

PDF

Open Access

TL;DR

This paper introduces a cross-domain acoustic-to-articulatory inversion method using neural networks to generate articulatory features, significantly improving disordered speech recognition accuracy across datasets.

Contribution

It presents a novel cross-domain A2A inversion approach with neural models and feature adaptation, enhancing disordered speech recognition performance.

Findings

01

Achieved a WER of 24.82% on UASpeech, the lowest reported.

02

Incorporating articulatory features outperforms acoustic-only systems.

03

Multi-modal system with video and data augmentation further improves results.

Abstract

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech. Their practical application to disordered speech recognition is often limited by the difficulty in collecting such specialist data from impaired speakers. This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training before being cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features. Mixture density networks based neural A2A inversion models were used. A cross-domain feature adaptation network was also used to reduce the acoustic mismatch between the TORGO and UASpeech data. On both tasks, incorporating the A2A generated articulatory features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders