Hypergraph Multi-modal Large Language Model: Exploiting EEG and   Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video   Understanding

Minghui Wu; Chenxu Zhao; Anyang Su; Donglin Di; Tianyu Fu; Da An; Min; He; Ya Gao; Meng Ma; Kun Yan; Ping Wang

arXiv:2407.08150·cs.CV·September 6, 2024·1 cites

Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min, He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale multi-modal dataset combining EEG and eye-tracking data during video viewing, and proposes a hypergraph multi-modal large language model to analyze and understand diverse subjective responses for improved video understanding.

Contribution

The paper presents a novel multi-modal dataset and a hypergraph-based large language model that jointly analyze EEG, eye-tracking, and video content for personalized video understanding.

Findings

01

HMLLM effectively bridges semantic gaps across modalities.

02

The dataset captures diverse demographic responses.

03

Experimental results show improved video understanding performance.

Abstract

Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within the videos are excessively monotonous, transmitting allegories and emotions that are overly simplistic. To bridge the gap to real-world applications, we introduce a large-scale Subjective Response Indicators for Advertisement Videos dataset, namely SRI-ADV. Specifically, we collected real changes in Electroencephalographic (EEG) and eye-tracking regions from different demographics while they viewed identical video content. Utilizing this multi-modal dataset, we developed tasks and protocols to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mininglamp-mllm/hmllm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · COVID-19 diagnosis using AI