PROCESS-2: A Benchmark Speech Corpus for Early Cognitive Impairment Detection
Madhurananda Pahar, Caitlin H. Illingworth, Bahman Mirheidari, Hend Elghazaly, Fritz Peters, Sophie Young, Wing-Zin Leung, Labhpreet Kaur, Daniel Blackburn, Heidi Christensen

TL;DR
PROCESS-2 is a large, validated speech dataset designed to advance automatic detection of cognitive impairment through speech analysis, supporting scalable and non-invasive clinical research.
Contribution
The paper introduces PROCESS-2, a comprehensive, clinically validated speech corpus with standardized tasks and metadata, enabling reproducible research in cognitive impairment detection.
Findings
Demonstrated clinically meaningful group separation
Achieved stable baseline modelling performance
Validated dataset quality and demographic balance
Abstract
Speech-based analysis offers a scalable and non-invasive approach for detecting cognitive decline, yet progress has been constrained by the limited availability of clinically validated datasets collected under realistic conditions. We introduce PROCESS-2, a large-scale speech dataset designed to support research on automatic assessment of cognitive impairment from spontaneous and task-oriented speech. The dataset comprises recordings from 200 healthy controls, 150 mild cognitive impairment, and 50 dementia diagnoses collected using the CognoMemory digital assessment platform. Each participant completed a single assessment session, including picture description and verbal fluency tasks, accompanied by manually verified transcripts and participant-level metadata. PROCESS-2 contains approximately 21 hours of speech audio with predefined train/test partitions. Comprehensive technical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
