ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

Michal Nazarczuk; Sibi Catley-Chandar; Thomas Tanay; Zhensong Zhang; Gregory Slabaugh; Eduardo P\'erez-Pellitero

arXiv:2506.18792·cs.CV·June 24, 2025

ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

Michal Nazarczuk, Sibi Catley-Chandar, Thomas Tanay, Zhensong Zhang, Gregory Slabaugh, Eduardo P\'erez-Pellitero

PDF

TL;DR

ViDAR introduces a novel 4D reconstruction framework using diffusion models to generate multi-view supervision from monocular videos, significantly improving dynamic scene reconstruction quality.

Contribution

The paper presents ViDAR, a new method that leverages personalized diffusion models and a diffusion-aware loss for monocular 4D scene reconstruction.

Findings

01

Outperforms state-of-the-art methods on DyCheck benchmark.

02

Achieves better visual quality and geometric consistency.

03

Shows strong improvements in dynamic regions.

Abstract

Dynamic Novel View Synthesis aims to generate photorealistic views of moving subjects from arbitrary viewpoints. This task is particularly challenging when relying on monocular video, where disentangling structure from motion is ill-posed and supervision is scarce. We introduce Video Diffusion-Aware Reconstruction (ViDAR), a novel 4D reconstruction framework that leverages personalised diffusion models to synthesise a pseudo multi-view supervision signal for training a Gaussian splatting representation. By conditioning on scene-specific features, ViDAR recovers fine-grained appearance details while mitigating artefacts introduced by monocular ambiguity. To address the spatio-temporal inconsistency of diffusion-based supervision, we propose a diffusion-aware loss function and a camera pose optimisation strategy that aligns synthetic views with the underlying scene geometry. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.