Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding
Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min, He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

TL;DR
This paper introduces a large-scale multi-modal dataset combining EEG and eye-tracking data during video viewing, and proposes a hypergraph multi-modal large language model to analyze and understand diverse subjective responses for improved video understanding.
Contribution
The paper presents a novel multi-modal dataset and a hypergraph-based large language model that jointly analyze EEG, eye-tracking, and video content for personalized video understanding.
Findings
HMLLM effectively bridges semantic gaps across modalities.
The dataset captures diverse demographic responses.
Experimental results show improved video understanding performance.
Abstract
Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within the videos are excessively monotonous, transmitting allegories and emotions that are overly simplistic. To bridge the gap to real-world applications, we introduce a large-scale Subjective Response Indicators for Advertisement Videos dataset, namely SRI-ADV. Specifically, we collected real changes in Electroencephalographic (EEG) and eye-tracking regions from different demographics while they viewed identical video content. Utilizing this multi-modal dataset, we developed tasks and protocols to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · COVID-19 diagnosis using AI
