TL;DR
SeeU introduces a 4D dynamics-aware framework that reconstructs, models, and generates unseen visual content by understanding continuous space-time dynamics from 2D observations.
Contribution
It presents a novel 2D to 4D to 2D learning framework that models continuous 4D dynamics for improved unseen content generation.
Findings
Achieves continuous, physically-consistent visual generation.
Demonstrates strong results in unseen temporal and spatial generation.
Enables advanced video editing capabilities.
Abstract
Images and videos are discrete 2D projections of the 4D world (3D space + time). Most visual understanding, prediction, and generation operate directly on 2D observations, leading to suboptimal performance. We propose SeeU, a novel approach that learns the continuous 4D dynamics and generate the unseen visual contents. The principle behind SeeU is a new 2D4D2D learning framework. SeeU first reconstructs the 4D world from sparse and monocular 2D frames (2D4D). It then learns the continuous 4D dynamics on a low-rank representation and physical constraints (discrete 4Dcontinuous 4D). Finally, SeeU rolls the world forward in time, re-projects it back to 2D at sampled times and viewpoints, and generates unseen regions based on spatial-temporal context awareness (4D2D). By modeling dynamics in 4D, SeeU achieves continuous and physically-consistent novel visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
