Disability-First AI Dataset Annotation: Co-designing Stuttered Speech Annotation Guidelines with People Who Stutter
Xinru Tang, Jingjin Li, Shaomei Wu

TL;DR
This paper explores the challenges of annotating stuttered speech in AI datasets by involving people who stutter in co-designing annotation guidelines, emphasizing the importance of lived experience for accuracy.
Contribution
It introduces a disability-first annotation approach through co-design workshops with PWS, improving annotation practices by integrating embodied knowledge.
Findings
Co-design with PWS enhances annotation accuracy.
Embodied knowledge improves dataset quality.
Tensions exist between disability complexity and static labels.
Abstract
Despite efforts to increase the representation of disabled people in AI datasets, accessibility datasets are often annotated by crowdworkers without disability-specific expertise, leading to inconsistent or inaccurate labels. This paper examines these annotation challenges through a case study of annotating speech data from people who stutter (PWS). Given the variability of stuttering and differing views on how it manifests, annotating and transcribing stuttered speech remains difficult, even for trained professionals. Through interviews and co-design workshops with PWS and domain experts, we identify challenges in stuttered speech annotation and develop practices that integrate the lived experiences of PWS into the annotation process. Our findings highlight the value of embodied knowledge in improving dataset quality, while revealing tensions between the complexity of disability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStuttering Research and Treatment · Text Readability and Simplification · Writing and Handwriting Education
