TL;DR
This paper introduces PREX, a region-aware framework for faithful 4D video editing that preserves observed regions and synthesizes new content, addressing issues of preservation drift and ghosting.
Contribution
PREX decomposes 4D video editing into Preserve, Reveal, and Expand roles, using a region-aware adapter and a new benchmark for evaluation without needing paired edited videos.
Findings
PREX reduces region-structured failures in 4D video editing.
PREX maintains high visual quality and strong 4D control.
The PREBench benchmark enables detailed diagnosis of editing performance.
Abstract
Existing 4D-driven video diffusion models primarily target plausible generation, but faithful 4D editing requires preserving source-observed regions while synthesizing disoccluded or out-of-view content. We identify Evidence-Role Mismatch: reliable source-backed evidence, unreliable rendered cues, and unsupported regions are entangled in a single conditioning signal, causing preservation drift, ghosting, and unstable extrapolation. We propose PREX (Preserve, Reveal, Expand), a region-aware framework that decomposes the target spatiotemporal volume into Preserve, Reveal, and Expand roles according to observation support and scene extent. PREX builds observation-backed appearance cues with calibrated confidence and injects them into a frozen video diffusion backbone through a region-aware adapter, trained with proxy tasks without requiring paired edited videos. We further introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
