Forecasting Future Videos from Novel Views via Disentangled 3D Scene   Representation

Sudhir Yarram; Junsong Yuan

arXiv:2407.21450·cs.CV·August 5, 2024

Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation

Sudhir Yarram, Junsong Yuan

PDF

Open Access

TL;DR

This paper introduces a novel 3D scene representation method for video extrapolation that disentangles scene geometry from motion, enabling more accurate future video prediction from new viewpoints.

Contribution

It proposes a disentangled two-stage approach for forecasting ego-motion and residual object motion, improving accuracy over entangled representations.

Findings

01

Outperforms strong baselines on urban scene datasets

02

Enables high-quality rendering of future videos from novel views

03

Reduces inaccuracies caused by entangled scene representations

Abstract

Video extrapolation in space and time (VEST) enables viewers to forecast a 3D scene into the future and view it from novel viewpoints. Recent methods propose to learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together, while assuming simplified affine motion and homography-based warping at each scene layer, leading to inaccurate video extrapolation. Instead of entangled scene representation and rendering, our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds, which enables high quality rendering of future videos from novel views. To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects (e.g., cars, people). This approach ensures more precise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques