# Predicting adolescent depressive symptoms using teacher-reported textual descriptions of abnormal behaviors: a study based on machine learning

**Authors:** Nigela Wumaierjiang, Guoli Yan, Lidan Yuan, Jianan Song, Xiaofei Hou, Minghui Li, Ling Sun, Jiansong Zhou, Huifang Yin, Guangming Xu

PMC · DOI: 10.3389/frai.2025.1732682 · 2026-01-08

## TL;DR

This study uses machine learning to predict adolescent depression from teacher reports of student behavior, showing high accuracy with Random Forest models.

## Contribution

A novel application of machine learning models, particularly Random Forest, to detect depressive symptoms in adolescents using teacher-reported text data.

## Key findings

- Random Forest achieved 97% recall in predicting depressive symptoms from teacher reports.
- Teacher-reported text can effectively identify adolescents with clinically significant depressive symptoms.
- Machine learning models offer a practical tool for early detection of depression in school settings.

## Abstract

This study aimed to develop and compare machine learning (ML) models for predicting depressive symptoms in adolescents, based on teacher-reported textual descriptions of student behaviors.

Participants were 441 adolescents from Tianjin, China. Their teachers provided written reports on behavioral or emotional concerns, while the students completed the Patient Health Questionnaire-9 (PHQ-9). Text data from reports were processed using Term Frequency-Inverse Document Frequency (TF-IDF). Four ML models—Random Forest (RF), Support Vector Machine (SVM), eXtreme Gradient Boosting (XGBoost), and Least Absolute Shrinkage and Selection Operator (LASSO)—were trained and evaluated using a 80/20 data split and 5-fold cross-validation.

PHQ-9 screening identified 71.7% (n = 316) of adolescents with clinically significant depressive symptoms (score ≥10). The Random Forest (RF) model demonstrated superior performance, achieving a recall of 0.97, accuracy of 0.91, precision of 0.92, and F1-score of 0.92. SVM and XGBoost also showed good performance, while LASSO was the weakest. The analysis demonstrated that teacher reports could identify depressive symptoms with up to 97% recall.

Machine learning, particularly Random Forest, can effectively predict adolescent depressive symptoms from teacher-reported text. This approach offers a practical and efficient tool for early identification in school settings, facilitating timely intervention.

## Full-text entities

- **Diseases:** depressive symptoms (MESH:D003866)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12823932/full.md

---
Source: https://tomesphere.com/paper/PMC12823932