WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper

Emmanuel Akinrintoyo; Nadine Abdelhalim; Nicole Salomons

arXiv:2505.21551·eess.AS·May 29, 2025

WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper

Emmanuel Akinrintoyo, Nadine Abdelhalim, Nicole Salomons

PDF

Open Access

TL;DR

This paper enhances Whisper's ability to transcribe dementia speech by fine-tuning it on specialized datasets, significantly reducing errors and improving detection of filler words, which is crucial for diagnosis and assistive tech development.

Contribution

The study introduces a fine-tuning approach for Whisper using dementia-specific datasets, improving transcription accuracy and filler word detection for dementia speech recognition.

Findings

01

Fine-tuned Whisper achieved a WER of 0.24.

02

Model outperformed previous dementia speech recognition methods.

03

Demonstrated good generalizability to unseen speech patterns.

Abstract

Whisper fails to correctly transcribe dementia speech because persons with dementia (PwDs) often exhibit irregular speech patterns and disfluencies such as pauses, repetitions, and fragmented sentences. It was trained on standard speech and may have had little or no exposure to dementia-affected speech. However, correct transcription is vital for dementia speech for cost-effective diagnosis and the development of assistive technology. In this work, we fine-tune Whisper with the open-source dementia speech dataset (DementiaBank) and our in-house dataset to improve its word error rate (WER). The fine-tuning also includes filler words to ascertain the filler inclusion rate (FIR) and F1 score. The fine-tuned models significantly outperformed the off-the-shelf models. The medium-sized model achieved a WER of 0.24, outperforming previous work. Similarly, there was a notable generalisability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Emotion and Mood Recognition