Semantic-E2VID: a Semantic-Enriched Paradigm for Event-to-Video Reconstruction
Jingqian Wu, Yunbo Jia, Shengpeng Xu, and Edmund Y. Lam

TL;DR
Semantic-E2VID introduces a novel semantic-enriched framework for event-to-video reconstruction, leveraging pretrained semantic models to incorporate object-level understanding and improve reconstruction quality.
Contribution
It pioneers integrating semantic abstraction with event-based reconstruction, explicitly modeling object-level structure to enhance video recovery from event streams.
Findings
Outperforms state-of-the-art E2V methods on six benchmarks.
Effectively incorporates semantic information to improve reconstruction fidelity.
Guided by semantic-aware supervision, achieves more meaningful and accurate videos.
Abstract
Event cameras provide a promising sensing modality for high-speed and high-dynamic-range vision by asynchronously capturing brightness changes. A fundamental task in event-based vision is event-to-video (E2V) reconstruction, which aims to recover intensity videos from event streams. Most existing E2V approaches formulate reconstruction as a temporal--spatial signal recovery problem, relying on temporal aggregation and spatial feature learning to infer intensity frames. While effective to some extent, this formulation overlooks a critical limitation of event data: due to the change-driven sensing mechanism, event streams are inherently semantically under-determined, lacking object-level structure and contextual information that are essential for faithful reconstruction. In this work, we revisit E2V from a semantic perspective and argue that effective reconstruction requires going beyond…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Human Pose and Action Recognition
