Advanced Gesture Recognition for Autism Spectrum Disorder Detection: Integrating YOLOv7, Video Augmentation, and VideoMAE for Naturalistic Video Analysis
Amit Kumar Singh, Vrijendra Singh

TL;DR
This paper presents a novel deep learning pipeline combining YOLOv7, video augmentation, and VideoMAE to accurately classify children with autism spectrum disorder from typical peers using naturalistic video analysis.
Contribution
It introduces an integrated approach that leverages advanced detection, augmentation, and masked autoencoder techniques for ASD detection in uncontrolled environments.
Findings
Achieved 95% accuracy in ASD classification
Surpassed previous state-of-the-art performance
Demonstrated robustness in naturalistic video settings
Abstract
Deep learning and contactless sensing technologies have significantly advanced the automated assessment of human behaviors in healthcare. In the context of autism spectrum disorder (ASD), repetitive motor behaviors such as spinning, head banging, and arm flapping are key indicators for diagnosis. This study focuses on distinguishing between children with ASD and typically developed (TD) peers by analyzing videos captured in natural, uncontrolled environments. Using the publicly available Self-Stimulatory Behavior Dataset (SSBD), we address the classification task as a binary problem, ASD vs. TD, based on stereotypical repetitive gestures. We adopt a pipeline integrating YOLOv7-based detection, extensive video augmentations, and the VideoMAE framework, which efficiently captures both spatial and temporal features through a high-ratio masking and reconstruction strategy. Our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutism Spectrum Disorder Research · Assistive Technology in Communication and Mobility
MethodsFocus
