Learning Cross-Joint Attention for Generalizable Video-Based Seizure Detection
Omar Zamzam, Takfarinas Medani, Chinmay Chinara, Richard Leahy

TL;DR
This paper introduces a joint-centric attention model using body joint dynamics and transformer-based techniques to improve generalization in video-based seizure detection across different subjects.
Contribution
It proposes a novel cross-joint attention framework that emphasizes body dynamics and suppresses background bias for better cross-subject seizure detection.
Findings
Outperforms state-of-the-art methods on unseen subjects
Effectively models spatial and temporal body part interactions
Enhances generalization in seizure detection
Abstract
Automated seizure detection from long-term clinical videos can substantially reduce manual review time and enable real-time monitoring. However, existing video-based methods often struggle to generalize to unseen subjects due to background bias and reliance on subject-specific appearance cues. We propose a joint-centric attention model that focuses exclusively on body dynamics to improve cross-subject generalization. For each video segment, body joints are detected and joint-centered clips are extracted, suppressing background context. These joint-centered clips are tokenized using a Video Vision Transformer (ViViT), and cross-joint attention is learned to model spatial and temporal interactions between body parts, capturing coordinated movement patterns characteristic of seizure semiology. Extensive cross-subject experiments show that the proposed method consistently outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Human Pose and Action Recognition · Emotion and Mood Recognition
