Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve, Renals

TL;DR
This study explores using ultrasound tongue imaging combined with audio analysis to automatically detect speech articulation errors in children, showing promising accuracy and potential for clinical speech therapy applications.
Contribution
It introduces a novel automated system leveraging ultrasound and audio data for detecting speech articulation errors, trained on both child and adult speech datasets.
Findings
Achieved 86.9% accuracy in typical speech error detection.
Best velar fronting detection correctly identified 86.6% of errors.
System shows potential for integration into speech therapy tools.
Abstract
Speech sound disorders are a common communication impairment in childhood. Because speech disorders can negatively affect the lives and the development of children, clinical intervention is often recommended. To help with diagnosis and treatment, clinicians use instrumented methods such as spectrograms or ultrasound tongue imaging to analyse speech articulations. Analysis with these methods can be laborious for clinicians, therefore there is growing interest in its automation. In this paper, we investigate the contribution of ultrasound tongue imaging for the automatic detection of speech articulation errors. Our systems are trained on typically developing child speech and augmented with a database of adult speech using audio and ultrasound. Evaluation on typically developing speech indicates that pre-training on adult speech and jointly using ultrasound and audio gives the best results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
