Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

Darren F\"urst; Sebastian Steindl; Ulrich Sch\"afer

arXiv:2604.26568·cs.CL·April 30, 2026

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

Darren F\"urst, Sebastian Steindl, Ulrich Sch\"afer

PDF

1 Repo

TL;DR

This paper presents a hierarchical, data-augmented approach using Speech Representation Models that outperforms LLMs in pediatric speech disorder classification tasks, addressing staffing shortages in speech therapy.

Contribution

It introduces a cascading classification method with targeted data augmentation for speech models, improving clinical task performance over existing LLM-based methods.

Findings

01

SRM outperforms LLMs across all tasks in the benchmark.

02

Data augmentation mitigates biases and enhances model robustness.

03

Hierarchical classification improves specificity in SSD diagnosis.

Abstract

Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary classification to type, and symptom classification. By fine-tuning Speech Representation Models (SRM), and using targeted data augmentation we mitigate biases found by previous works, and improve upon all clinical tasks in the benchmark. We also treat Automatic Speech Recognition (ASR) with our data augmentation approach. Our results demonstrate that SRM consistently outperform the LLM-based state-of-the-art across all evaluated tasks by a large margin. We publish our models and code to foster future research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.