Consistent Depth of Moving Objects in Video

Zhoutong Zhang; Forrester Cole; Richard Tucker; William T. Freeman,; Tali Dekel

arXiv:2108.01166·cs.CV·August 4, 2021

Consistent Depth of Moving Objects in Video

Zhoutong Zhang, Forrester Cole, Richard Tucker, William T. Freeman,, Tali Dekel

PDF

TL;DR

This paper introduces a novel test-time training framework that estimates consistent depth in videos with moving objects and camera motion, enabling realistic 3D scene understanding and editing effects.

Contribution

It proposes a new method combining depth prediction CNN and scene-flow MLP trained jointly for temporally consistent depth estimation in dynamic scenes.

Findings

01

Achieves accurate, temporally coherent depth maps on diverse videos.

02

Enables depth-and-motion aware video editing effects.

03

Demonstrates robustness to various moving objects and camera motions.

Abstract

We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this underconstrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction MLP over the entire input video. By recursively unrolling the scene-flow prediction MLP over varying time steps, we compute both short-range scene flow to impose local smooth motion priors directly in 3D, and long-range scene flow to impose multi-view consistency constraints with wide baselines. We demonstrate accurate and temporally coherent results on a variety of challenging videos…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttentive Walk-Aggregating Graph Neural Network