LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction
Kanghao Chen, Hangyu Li, JiaZhou Zhou, Zeyu Wang, Lin Wang

TL;DR
This paper introduces LaSe-E2V, a novel framework that leverages language guidance and diffusion models to improve semantic consistency and quality in event-to-video reconstruction, addressing artifacts and regional blur.
Contribution
It proposes a language-guided, semantic-aware E2V reconstruction method with event-conditioned attention, new loss functions, and data augmentation strategies for training without paired data.
Findings
Outperforms existing methods on multiple challenging datasets.
Achieves higher semantic consistency and visual quality.
Effective in scenarios with fast motion and low light.
Abstract
Event cameras harness advantages such as low latency, high temporal resolution, and high dynamic range (HDR), compared to standard cameras. Due to the distinct imaging paradigm shift, a dominant line of research focuses on event-to-video (E2V) reconstruction to bridge event-based and standard computer vision. However, this task remains challenging due to its inherently ill-posed nature: event cameras only detect the edge and motion information locally. Consequently, the reconstructed videos are often plagued by artifacts and regional blur, primarily caused by the ambiguous semantics of event data. In this paper, we find language naturally conveys abundant semantic information, rendering it stunningly superior in ensuring semantic consistency for E2V reconstruction. Accordingly, we propose a novel framework, called LaSe-E2V, that can achieve semantic-aware high-quality E2V reconstruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Diffusion
