# Constructing a risk screen for attention difficulty in U.S. adults using six machine learning methods

**Authors:** Ying Song, Yansun Sun, Zedan Guo, Li Yi

PMC · DOI: 10.3389/frai.2025.1704576 · Frontiers in Artificial Intelligence · 2026-01-12

## TL;DR

This study uses machine learning to identify risk factors for attention difficulty in U.S. adults, finding that logistic regression performs best in predicting these risks.

## Contribution

The novel contribution is the development and comparison of six machine learning models to predict attention difficulty risk factors in a U.S. adult population.

## Key findings

- Logistic regression showed the best predictive value with AUCs of 0.881 and 0.818 in internal and external validation.
- Random Forest provided the largest net benefits in the external cohort at a threshold of 0.2–0.3.

## Abstract

Concentration difficulty is recognized as a hallmark of various neurologic and neuropsychiatric disorders. However, an accurate estimation of epidemiological risk factors for concentration difficulty remains severely limited.

The study aimed to develop an interpretable machine-learning (ML) model to predict risk factors of concentration difficulty among adults in the United States.

A total of 9,971 participants were included from the 2015–2016 cycle of the National Health and Nutrition Examination Survey (NHANES). Six ML algorithms, including Logistic Regression, ExtraTrees classifier, Bagging, Gradient Boosting, Extreme Gradient Boosting (XGBoost), and Random Forest (RF), were applied in this study. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, precision, specificity, decision curve analysis (DCA), and calibration plots. Finally, a nomogram was constructed based on the best performing model.

Of these, 2,146 participants aged 20 years and older were analyzed. Logistic regression exhibited the best clinical predictive value in both internal and external validation sets, with AUCs of 0.881 and 0.818, respectively. The DCA curve revealed that logistic regression exhibited the greatest net benefits in the internal cohort, whereas the RF model provided the largest net benefits in the external cohort (threshold: 0.2–0.3).

Logistic regression exhibited the highest clinical value in predicting concentration difficulty. These findings provide valuable insights for the recognition, management, and effective interference strategies for concentration difficulty.

## Full-text entities

- **Diseases:** neurologic and (MESH:D009461), neuropsychiatric disorders (MESH:D001523), Concentration difficulty (MESH:C567712), attention difficulty (MESH:D001289)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12833214/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12833214/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/PMC12833214/full.md

---
Source: https://tomesphere.com/paper/PMC12833214