Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech
Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie, Seaver, Richard Cave, Marilyn Ladewig, Rus Heywood, Jordan R. Green

TL;DR
This paper details Project Euphonia's advancements in creating a high-quality, diverse disordered speech dataset with extensive annotations, improving ASR systems and understanding of speech disorders.
Contribution
The paper introduces new data collection, annotation, and metadata strategies that enhance the quality and diversity of disordered speech datasets for ASR research.
Findings
Transcript corrections significantly improve ML model performance.
Inter-rater variability affects assessment consistency.
Metadata collection informs better disordered speech analysis.
Abstract
Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and diverse speech corpus. This report describes the project's latest advancements in data collection and annotation methodologies, such as expanding speaker diversity in the database, adding human-reviewed transcript corrections and audio quality tags to 350K (of the 1.2M total) audio recordings, and amassing a comprehensive set of metadata (including more than 40 speech characteristic labels) for over 75\% of the speakers in the database. We report on the impact of transcript corrections on our machine-learning (ML) research, inter-rater variability of assessments of disordered speech patterns, and our rationale for gathering speech metadata. We also consider the limitations of using automated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification
MethodsSparse Evolutionary Training
