Hybrid Vision Transformer-Mamba Framework for Autism Diagnosis via Eye-Tracking Analysis
Wafaa Kasri, Yassine Himeur, Abigail Copiaco, Wathiq Mansoor, Ammar Albanna, and Valsamma Eapen

TL;DR
This paper introduces a hybrid deep learning framework combining Vision Transformers and Vision Mamba to improve autism diagnosis accuracy using eye-tracking data, integrating visual, speech, and facial cues with explainability.
Contribution
It presents a novel hybrid ViT-Mamba model that fuses multimodal cues for ASD detection, outperforming existing methods with high accuracy and interpretability.
Findings
Achieved 0.96 accuracy on Saliency4ASD dataset
Outperformed existing ASD detection methods
Demonstrated high sensitivity and specificity
Abstract
Accurate Autism Spectrum Disorder (ASD) diagnosis is vital for early intervention. This study presents a hybrid deep learning framework combining Vision Transformers (ViT) and Vision Mamba to detect ASD using eye-tracking data. The model uses attention-based fusion to integrate visual, speech, and facial cues, capturing both spatial and temporal dynamics. Unlike traditional handcrafted methods, it applies state-of-the-art deep learning and explainable AI techniques to enhance diagnostic accuracy and transparency. Tested on the Saliency4ASD dataset, the proposed ViT-Mamba model outperformed existing methods, achieving 0.96 accuracy, 0.95 F1-score, 0.97 sensitivity, and 0.94 specificity. These findings show the model's promise for scalable, interpretable ASD screening, especially in resource-constrained or remote clinical settings where access to expert diagnosis is limited.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutism Spectrum Disorder Research · Visual Attention and Saliency Detection · Gaze Tracking and Assistive Technology
