FutureDepth: Learning to Predict the Future Improves Video Depth   Estimation

Rajeev Yasarla; Manish Kumar Singh; Hong Cai; Yunxiao Shi; Jisoo; Jeong; Yinhao Zhu; Shizhong Han; Risheek Garrepalli; Fatih Porikli

arXiv:2403.12953·cs.CV·January 17, 2025·1 cites

FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo, Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli

PDF

Open Access

TL;DR

FutureDepth introduces a novel approach to video depth estimation by learning to predict future frames, leveraging multi-frame and motion cues, resulting in state-of-the-art accuracy across diverse benchmarks.

Contribution

The paper proposes FutureDepth, a new method that incorporates future prediction and multi-frame correspondence learning to enhance video depth estimation performance.

Findings

01

Significantly outperforms existing methods on NYUDv2, KITTI, DDAD, and Sintel benchmarks.

02

Achieves state-of-the-art accuracy in video depth estimation.

03

Maintains efficiency comparable to monocular models.

Abstract

In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame features one time step ahead iteratively. In this way, F-Net learns the underlying motion and correspondence information, and we incorporate its features into the depth decoding process. Additionally, to enrich the learning of multiframe correspondence cues, we further leverage a reconstruction network, R-Net, which is trained via adaptively masked auto-encoding of multiframe feature volumes. At inference time, both F-Net and R-Net are used to produce queries to work with the depth decoder, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques