Learnings from curating a trustworthy, well-annotated, and useful   dataset of disordered English speech

Pan-Pan Jiang; Jimmy Tobin; Katrin Tomanek; Robert L. MacDonald; Katie; Seaver; Richard Cave; Marilyn Ladewig; Rus Heywood; Jordan R. Green

arXiv:2409.09190·eess.AS·September 17, 2024

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie, Seaver, Richard Cave, Marilyn Ladewig, Rus Heywood, Jordan R. Green

PDF

Open Access

TL;DR

This paper details Project Euphonia's advancements in creating a high-quality, diverse disordered speech dataset with extensive annotations, improving ASR systems and understanding of speech disorders.

Contribution

The paper introduces new data collection, annotation, and metadata strategies that enhance the quality and diversity of disordered speech datasets for ASR research.

Findings

01

Transcript corrections significantly improve ML model performance.

02

Inter-rater variability affects assessment consistency.

03

Metadata collection informs better disordered speech analysis.

Abstract

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and diverse speech corpus. This report describes the project's latest advancements in data collection and annotation methodologies, such as expanding speaker diversity in the database, adding human-reviewed transcript corrections and audio quality tags to 350K (of the 1.2M total) audio recordings, and amassing a comprehensive set of metadata (including more than 40 speech characteristic labels) for over 75\% of the speakers in the database. We report on the impact of transcript corrections on our machine-learning (ML) research, inter-rater variability of assessments of disordered speech patterns, and our rationale for gathering speech metadata. We also consider the limitations of using automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification

MethodsSparse Evolutionary Training