Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang, Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang,, Harsh Singh, Pamela R. Rollins, Yapeng Tian

TL;DR
This paper introduces a new audio-visual autism behavior recognition task, presents a large dataset, and demonstrates that multimodal models improve recognition accuracy and explanation capabilities in autism screening.
Contribution
It defines a novel audio-visual autism behavior recognition problem, creates the largest dataset for this task, and explores multimodal foundation and large language models for improved performance.
Findings
Multimodal integration improves autism behavior recognition accuracy.
The AV-ASD dataset covers extensive autism-related behaviors.
Using large language models enhances interpretability in autism recognition.
Abstract
In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-related behaviors. To facilitate this new research direction, we collected an audio-visual autism spectrum dataset (AV-ASD), currently the largest video dataset for autism screening using a behavioral approach. It covers an extensive range of autism-associated behaviors, including those related to social communication and interaction. To pave the way for further research on this new problem, we intensively explored leveraging foundation models and multimodal large language models across different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChild Development and Digital Technology · Music and Audio Processing · Autism Spectrum Disorder Research
