# Use of Automation Technologies and Data Mining in Speech Recognition for Autism

**Authors:** Rongjie Mao, Yuncheng Zhu

PMC · DOI: 10.1002/brb3.71229 · Brain and Behavior · 2026-01-28

## TL;DR

This paper reviews how automated tools and data mining can help detect autism through speech analysis, highlighting progress and challenges in making these tools reliable and scalable.

## Contribution

The paper provides a structured narrative review of methodological developments in automated speech-based ASD assessment from 1994 to 2025.

## Key findings

- Automated speech analysis tools like LENA, wav2vec 2.0, and Whisper show moderate-to-high accuracy in detecting ASD and estimating severity.
- Data mining methods have evolved from logistic regression to transformer-based models, but face challenges in generalization and dataset limitations.
- Persistent issues include performance degradation across languages and settings, and barriers like privacy and interpretability hinder clinical deployment.

## Abstract

Early identification of autism spectrum disorder (ASD) is critical for improving long‐term outcomes, and speech offers a noninvasive source of clinically relevant biomarkers. However, manual speech analysis is time‐consuming and difficult to scale. With advances in digital recording, signal processing, and artificial intelligence, researchers have increasingly deployed automated tools and data‐mining methods to characterize speech and language in ASD.

This structured narrative review summarizes methodological developments in speech‐based ASD assessment from 1994 to 2025, spanning diverse tasks and recording settings and focusing on automated tools, data‐mining methods, and their clinical translation. We first consider core automated toolchains, including LENA, Praat, HTK/FAVE, CMU Sphinx, Kaldi, AutoSALT, openSMILE/eGeMAPS, diarization systems, and foundation‐model ASR systems (e.g., Whisper), as well as modern self‐supervised encoders such as wav2vec 2.0 and TRILLsson. Their typical use cases, psychometric properties, and limitations are highlighted. We then chart the progression of data‐mining and machine‐learning approaches from early logistic regression and clustering, through regularized regression, SVMs, and tree ensembles, to CNN/LSTM sequence models and transformer‐based text and speech models (e.g., BERT, LLMs).

Across these stages, automated indices of prosody, voice quality, linguistic content, and interactional behavior show moderate‐to‐high accuracy for ASD detection and meaningful associations with clinician‐rated severity. Nonetheless, various problems persist: performance often degrades across languages, ages, tasks, and recording settings; evaluation and reporting remain heterogeneous; datasets are typically small and single‐site; and privacy, fairness, interpretability, and computational efficiency pose persistent barriers to deployment, highlighting the need for target‐context benchmarking and pre‐specified evaluation/reporting.

We outline three priority strategies to guide future work toward scalable, clinically credible ASD speech assessment and longitudinal monitoring: optimize and integrate existing toolchains, enable global yet privacy‐preserving data sharing, and leverage cross‐domain innovations in enhancement, label efficiency, and explainable, edge‐ready AI.

Pipeline analyzes clinical and naturalistic speech using LENA, wav2vec 2.0, and foundation‐model ASR (Whisper) to enable scalable ASD detection and severity estimation. Future work integrates benchmarking, privacy‐preserving collaboration (federated learning), and explainable, edge‐ready AI for clinically credible assessment and longitudinal monitoring.

## Linked entities

- **Diseases:** autism spectrum disorder (MONDO:0005258), ASD (MONDO:0006664)

## Full-text entities

- **Genes:** NINL (ninein like) [NCBI Gene 22981] {aka NLP}, SHC2 (SHC adaptor protein 2) [NCBI Gene 25759] {aka SCK, SHCB, SLI}
- **Diseases:** social communication deficits (MESH:D003147), GAN (MESH:D056768), SIT (MESH:C566973), ADOS (MESH:D001321), TinyML (MESH:D007859), AVM (MESH:D002538), MFCC (MESH:D006316), repetitive (MESH:D012090), RRB (MESH:D002313), CPP (MESH:D020288), TD (MESH:D002658), ASD (MESH:D000067877), neurodevelopmental condition (MESH:D020763), XAI (MESH:C538243), NDWR (MESH:D001037), hallucination (MESH:D006212), ELMo (MESH:D007806), CCC (MESH:C535313), BiLSTM (MESH:D000088562), MLUM (MESH:D007870)
- **Chemicals:** AVA-DA (-), CVC (MESH:C506967)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12848528/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12848528/full.md

## References

84 references — full list in the complete paper: https://tomesphere.com/paper/PMC12848528/full.md

---
Source: https://tomesphere.com/paper/PMC12848528