Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation
Vikram C. Mathad, Julie M. Liss, Kathy Chapman, Nancy Scherer, and, Visar Berisha

TL;DR
This paper introduces an objective measure called OAM, which uses deep learning to analyze consonant-vowel transitions for evaluating speech articulation across various speech impairments and accents.
Contribution
The study presents a novel deep learning-based method to objectively assess articulation by analyzing CV transitions, improving evaluation accuracy across diverse speech conditions.
Findings
OAM correlates with perceptual articulation measures.
Effective in assessing dysarthric, cleft palate, and accented speech.
Demonstrates robustness across different speech impairments and accents.
Abstract
Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
