Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and   Multi-frame Monocular Depth Estimation

Jinfeng Liu; Lingtong Kong; Bo Li; Zerong Wang; Hong Gu; and Jinwei Chen

arXiv:2407.14126·cs.CV·July 22, 2024

Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation

Jinfeng Liu, Lingtong Kong, Bo Li, Zerong Wang, Hong Gu, and Jinwei Chen

PDF

Open Access 1 Repo

TL;DR

Mono-ViFI introduces a unified self-supervised framework that enhances monocular depth estimation by synthesizing virtual views and fusing multi-frame features, improving accuracy and efficiency.

Contribution

It proposes a novel VFI-assisted multi-frame fusion module and a unified learning framework connecting single- and multi-frame depth estimation.

Findings

01

Significant improvement over existing methods in depth accuracy.

02

Effective virtual view synthesis enhances training guidance.

03

Shared weights enable a compact and memory-efficient model.

Abstract

Self-supervised monocular depth estimation has gathered notable interest since it can liberate training from dependency on depth annotations. In monocular video training case, recent methods only conduct view synthesis between existing camera views, leading to insufficient guidance. To tackle this, we try to synthesize more virtual camera views by flow-based video frame interpolation (VFI), termed as temporal augmentation. For multi-frame inference, to sidestep the problem of dynamic objects encountered by explicit geometry-based methods like ManyDepth, we return to the feature fusion paradigm and design a VFI-assisted multi-frame fusion module to align and aggregate multi-frame features, using motion and occlusion information obtained by the flow-based VFI model. Finally, we construct a unified self-supervised learning framework, named Mono-ViFI, to bilaterally connect single- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liujf1226/mono-vifi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Image Processing Techniques and Applications

MethodsALIGN