On the importance of normative data in speech-based assessment
Zeinab Noorian, Chlo\'e Pou-Prom, Frank Rudzicz

TL;DR
This paper demonstrates that augmenting sparse Alzheimer's datasets with normative data and using oversampling techniques significantly improves binary classification accuracy for AD detection.
Contribution
It introduces a novel approach of combining patient data with normative datasets and applies oversampling to enhance AD classification performance.
Findings
Outperforms state-of-the-art in AD classification
Combining normative and patient data is effective
Oversampling improves model accuracy
Abstract
Data sets for identifying Alzheimer's disease (AD) are often relatively sparse, which limits their ability to train generalizable models. Here, we augment such a data set, DementiaBank, with each of two normative data sets, the Wisconsin Longitudinal Study and Talk2Me, each of which employs a speech-based picture-description assessment. Through minority class oversampling with ADASYN, we outperform state-of-the-art results in binary classification of people with and without AD in DementiaBank. This work highlights the effectiveness of combining sparse and difficult-to-acquire patient data with relatively large and easily accessible normative datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Phonetics and Phonology Research · Topic Modeling
