Less is More: Consistent Video Depth Estimation with Masked Frames   Modeling

Yiran Wang; Zhiyu Pan; Xingyi Li; Zhiguo Cao; Ke Xian; Jianming Zhang

arXiv:2208.00380·cs.CV·August 19, 2022

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling

Yiran Wang, Zhiyu Pan, Xingyi Li, Zhiguo Cao, Ke Xian, Jianming Zhang

PDF

1 Repo

TL;DR

This paper introduces FMNet, a transformer-based model that predicts consistent video depth by reconstructing masked frames using neighboring frames, eliminating the need for optical flow or camera pose data.

Contribution

The novel FMNet approach achieves high temporal consistency in video depth estimation without extra information, simplifying the process and improving results.

Findings

01

Achieves comparable spatial accuracy to prior methods

02

Demonstrates higher temporal consistency

03

Does not require optical flow or camera poses

Abstract

Temporal consistency is the key challenge of video depth estimation. Previous works are based on additional optical flow or camera poses, which is time-consuming. By contrast, we derive consistency with less information. Since videos inherently exist with heavy temporal redundancy, a missing frame could be recovered from neighboring ones. Inspired by this, we propose the frame masking network (FMNet), a spatial-temporal transformer network predicting the depth of masked frames based on their neighboring frames. By reconstructing masked temporal features, the FMNet can learn intrinsic inter-frame correlations, which leads to consistency. Compared with prior arts, experimental results demonstrate that our approach achieves comparable spatial accuracy and higher temporal consistency without any additional information. Our work provides a new perspective on consistent video depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raymondwang987/fmnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.