Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos

Kaihua Chen; Tarasha Khurana; Deva Ramanan

arXiv:2507.12646·cs.CV·January 14, 2026

Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos

Kaihua Chen, Tarasha Khurana, Deva Ramanan

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel approach for dynamic scene view synthesis from monocular videos, combining 3D reconstruction, inpainting with diffusion models, and test-time finetuning to outperform prior methods.

Contribution

It proposes a new method that integrates 3D scene reconstruction, self-supervised video inpainting, and zero-shot test-time finetuning for dynamic view synthesis.

Findings

01

Outperforms prior methods in dynamic scene view synthesis

02

Uses self-supervised inpainting trained on in-the-wild videos

03

Enables zero-shot application via test-time finetuning

Abstract

We explore novel-view synthesis for dynamic scenes from monocular videos. Prior approaches rely on costly test-time optimization of 4D representations or do not preserve scene geometry when trained in a feed-forward manner. Our approach is based on three key insights: (1) covisible pixels (that are visible in both the input and target views) can be rendered by first reconstructing the dynamic 3D scene and rendering the reconstruction from the novel-views and (2) hidden pixels in novel views can be "inpainted" with feed-forward 2D video diffusion models. Notably, our video inpainting diffusion model (CogNVS) can be self-supervised from 2D videos, allowing us to train it on a large corpus of in-the-wild videos. This in turn allows for (3) CogNVS to be applied zero-shot to novel test videos via test-time finetuning. We empirically verify that CogNVS outperforms almost all prior art for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies