A Compacted Structure for Cross-domain learning on Monocular Depth and   Flow Estimation

Yu Chen; Xu Cao; Xiaoyi Lin; Baoru Huang; Xiao-Yun Zhou; Jian-Qing; Zheng; Guang-Zhong Yang

arXiv:2208.11993·cs.CV·August 26, 2022·1 cites

A Compacted Structure for Cross-domain learning on Monocular Depth and Flow Estimation

Yu Chen, Xu Cao, Xiaoyi Lin, Baoru Huang, Xiao-Yun Zhou, Jian-Qing, Zheng, Guang-Zhong Yang

PDF

Open Access

TL;DR

This paper introduces a compact multi-task learning framework that enhances monocular depth and flow estimation by integrating cross-domain information through novel mechanisms and a dual-head prediction approach, leading to improved accuracy.

Contribution

The paper proposes a new multi-task scheme with Flow to Depth, Depth to Flow, and EMA modules, enabling better cross-domain feature integration and more robust predictions.

Findings

01

Outperforms existing multi-task methods on KITTI dataset

02

Significant improvements in depth and flow prediction accuracy

03

Dual-head mechanism enhances motion estimation for rigid and non-rigid objects

Abstract

Accurate motion and depth recovery is important for many robot vision tasks including autonomous driving. Most previous studies have achieved cooperative multi-task interaction via either pre-defined loss functions or cross-domain prediction. This paper presents a multi-task scheme that achieves mutual assistance by means of our Flow to Depth (F2D), Depth to Flow (D2F), and Exponential Moving Average (EMA). F2D and D2F mechanisms enable multi-scale information integration between optical flow and depth domain based on differentiable shallow nets. A dual-head mechanism is used to predict optical flow for rigid and non-rigid motion based on a divide-and-conquer manner, which significantly improves the optical flow estimation performance. Furthermore, to make the prediction more robust and stable, EMA is used for our multi-task training. Experimental results on KITTI datasets show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Human Pose and Action Recognition