# Deep learning based approach for Behavior classification in diagnoses of Autism Spectrum Disorder using naturalistic videos

**Authors:** Usama Jabbar, Muhammad Waseem Iqbal, Alexandru Nechifor, Mohammed Abaker, Mohammed Ahmed Khairalseed, Valentin Marian Antohi, Costinela Fortea, Catalin Aurelian Stefanescu

PMC · DOI: 10.3389/fncom.2026.1626315 · Frontiers in Computational Neuroscience · 2026-03-18

## TL;DR

This paper introduces a deep learning model that accurately classifies behaviors in children's naturalistic videos to help diagnose autism spectrum disorder.

## Contribution

A novel CNN-GRU deep learning model is proposed for robust and accurate classification of self-stimulatory behaviors in ASD diagnosis.

## Key findings

- The proposed CNN-GRU model achieves a high accuracy of 0.9284 ± 0.0039–0.9294 ± 0.0038 using k-fold cross-validation.
- The model outperforms state-of-the-art methods in classifying abnormal behaviors from unstructured naturalistic videos.
- The approach shows potential for real-world clinical use in monitoring and screening autism spectrum disorder.

## Abstract

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that is marked by a lack of communication skills in social situations and repetitive and stereotypical Behaviors. The most widespread form of diagnosing ASD among children is based on psychological screening test along with monitoring of the Behavioral pattern, especially repetitive Behaviors. Some of these Behaviors include hand-flapping, head banging and spinning which are common among ASD children. In our research, we examine abnormal Behavioral patterns that may reflect ASD through the videos of children engaged in the everyday activities in the unstructured settings. A publicly available multiclass Self-Stimulatory Behavior Dataset (SSBD) is use in classify autistic Behavior. Before training the model, the dataset is thoroughly pre-processed (region-of-interest (ROI) detection and image cropping to eliminate irrelevant background objects). Moreover, information-augmenting methods are used to reduce overfitting and increase training efficiency and generalization effectiveness. In order to obtain spatiotemporal details successfully, a number of deep learning models are tested, such as studied CNN-GRU model, 3D-CNN + LSTM, MobileNet, VGG16, and EfficientNet-B7. The findings of the experiment prove that the proposed CNN-GRU model is superior to all competing methods. The model with a k-fold cross-validation provides a steady accuracy of 0.9284 ± 0.0039–0.9294 ± 0.0038, which means that the model is robust and consistent across the folds. The effectiveness of the proposed approach is additionally justified by the comparisons with state-of-the-art methods. The results show that the systems based on the action recognition can help clinicians monitor the Behavioral trends and facilitate the quick, accurate, and effective screening of ASD. The proposed approach works effectively in predicting Behavior in real-life, uncontrolled videos and shows tremendous potential for real-world clinical implementation as a decision-support tool.

## Linked entities

- **Diseases:** Autism Spectrum Disorder (MONDO:0005258)

## Full-text entities

- **Diseases:** ASD (MESH:D000067877), neurodevelopmental disorder (MESH:D002658), autistic Behavior (MESH:D001321)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13039029/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13039029/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC13039029/full.md

---
Source: https://tomesphere.com/paper/PMC13039029