TL;DR
UniE2F introduces a unified diffusion-based framework that reconstructs high-fidelity video frames from sparse event camera data, extending to interpolation and prediction, and significantly outperforms prior methods.
Contribution
The paper presents a novel unified diffusion model that reconstructs, interpolates, and predicts video frames from event data, leveraging a pre-trained video diffusion prior and event-based residual guidance.
Findings
Outperforms previous methods quantitatively and qualitatively
Effective zero-shot frame interpolation and prediction
High-fidelity video reconstruction from sparse event data
Abstract
Event cameras excel at high-speed, low-power, and high-dynamic-range scene perception. However, as they fundamentally record only relative intensity changes rather than absolute intensity, the resulting data streams suffer from a significant loss of spatial information and static texture details. In this paper, we address this limitation by leveraging the generative prior of a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data. Specifically, we first establish a baseline model by directly applying event data as a condition to synthesize videos. Then, based on the physical correlation between the event stream and video frames, we further introduce the event-based inter-frame residual guidance to enhance the accuracy of video frame reconstruction. Furthermore, we extend our method to video frame interpolation and prediction in a zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
