# Predicting autism spectrum disorder severity in children based on specific language milestones: a random forest model approach

**Authors:** Haiyi Xiong, Xueli Xiang, Xiao Liu, Ting Yang, Jinjin Chen, Jie Chen, Tingyu Li

PMC · DOI: 10.1186/s13034-025-00988-0 · Child and Adolescent Psychiatry and Mental Health · 2025-11-18

## TL;DR

This study uses a machine learning model to predict autism severity in children based on language milestones, showing strong predictive power in both younger and older groups.

## Contribution

A novel random forest model that identifies specific language milestones as reliable predictors of autism severity in children.

## Key findings

- 14 language milestones predicted ASD severity in children under 4 years, with 'Identifies 1 picture' and 'Expresses demands by language' being most significant.
- 16 milestones predicted severity in children aged 4 and above, with 'Identifies 2 colors' and 'Calls partner by name' as key predictors.
- Random forest models achieved AUC values of 0.81 for younger children and 0.85 for older children, indicating strong predictive performance.

## Abstract

Language impairments are among the most prevalent co-occurring conditions in children with autism spectrum disorder (ASD), and delayed language milestones often serve as early developmental warning signs. However, it remains unclear whether specific language milestones can reliably predict the severity of ASD symptoms, particularly in regions where there is a long delay between initial screening and formal diagnosis.

This study included 574 children diagnosed with ASD, stratified into two age groups: under 4 years (n = 288) and 4 years or above (n = 286). A total of 33 language milestone items covering receptive, expressive, and pragmatic aspects were evaluated. The Boruta algorithm was applied to identify significant predictors of symptom severity, and random forest models were constructed separately for each age group. Nested cross-validation and grid search were used for hyperparameter tuning. Model performance was assessed using bootstrapping with 1,000 replications to estimate area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and F1 scores.

In children under 4 years, 14 features were identified as significant predictors of ASD severity, with “Identifies 1 picture” and “Expresses demands by language” ranked highest. In children aged 4 years and above, 16 features were significant, with “Identifies 2 colors” and “Calls partner by name” being the most influential. The random forest models demonstrated robust predictive performance, with AUC values of 0.81 ± 0.01 (younger group) and 0.85 ± 0.00 (older group).

Our findings suggest that specific early language milestones, particularly those reflecting pragmatic abilities, may serve as valuable predictors of ASD severity. Leveraging these milestones in clinical practice could support earlier severity stratification and facilitate more tailored intervention planning, particularly in primary care settings.

The online version contains supplementary material available at 10.1186/s13034-025-00988-0.

## Linked entities

- **Diseases:** autism spectrum disorder (MONDO:0005258)

## Full-text entities

- **Diseases:** Language impairments (MESH:D007806), ASD (MESH:D000067877)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12625304/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12625304/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12625304/full.md

---
Source: https://tomesphere.com/paper/PMC12625304