SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

Minghan Yang; Lan Yang; Ke Li; Honggang Zhang; Kaiyue Pang; Yizhe Song

arXiv:2602.21819·cs.CV·March 2, 2026

SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

Minghan Yang, Lan Yang, Ke Li, Honggang Zhang, Kaiyue Pang, Yizhe Song

PDF

Open Access

TL;DR

SemVideo introduces a hierarchical semantic guidance framework for reconstructing videos from brain activity, significantly improving visual accuracy and temporal coherence over previous methods.

Contribution

The paper presents SemVideo, a novel fMRI-to-video reconstruction method utilizing hierarchical semantic cues and a tripartite attention architecture, advancing the state-of-the-art in neural decoding.

Findings

01

Achieves superior semantic alignment with brain signals.

02

Enhances temporal coherence in reconstructed videos.

03

Sets new benchmarks on CC2017 and HCP datasets.

Abstract

Reconstructing dynamic visual experiences from brain activity provides a compelling avenue for exploring the neural mechanisms of human visual perception. While recent progress in fMRI-based image reconstruction has been notable, extending this success to video reconstruction remains a significant challenge. Current fMRI-to-video reconstruction approaches consistently encounter two major shortcomings: (i) inconsistent visual representations of salient objects across frames, leading to appearance mismatches; (ii) poor temporal coherence, resulting in motion misalignment or abrupt frame transitions. To address these limitations, we introduce SemVideo, a novel fMRI-to-video reconstruction framework guided by hierarchical semantic information. At the core of SemVideo is SemMiner, a hierarchical guidance module that constructs three levels of semantic cues from the original video stimulus:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · EEG and Brain-Computer Interfaces · Visual Attention and Saliency Detection