A Compacted Structure for Cross-domain learning on Monocular Depth and Flow Estimation
Yu Chen, Xu Cao, Xiaoyi Lin, Baoru Huang, Xiao-Yun Zhou, Jian-Qing, Zheng, Guang-Zhong Yang

TL;DR
This paper introduces a compact multi-task learning framework that enhances monocular depth and flow estimation by integrating cross-domain information through novel mechanisms and a dual-head prediction approach, leading to improved accuracy.
Contribution
The paper proposes a new multi-task scheme with Flow to Depth, Depth to Flow, and EMA modules, enabling better cross-domain feature integration and more robust predictions.
Findings
Outperforms existing multi-task methods on KITTI dataset
Significant improvements in depth and flow prediction accuracy
Dual-head mechanism enhances motion estimation for rigid and non-rigid objects
Abstract
Accurate motion and depth recovery is important for many robot vision tasks including autonomous driving. Most previous studies have achieved cooperative multi-task interaction via either pre-defined loss functions or cross-domain prediction. This paper presents a multi-task scheme that achieves mutual assistance by means of our Flow to Depth (F2D), Depth to Flow (D2F), and Exponential Moving Average (EMA). F2D and D2F mechanisms enable multi-scale information integration between optical flow and depth domain based on differentiable shallow nets. A dual-head mechanism is used to predict optical flow for rigid and non-rigid motion based on a divide-and-conquer manner, which significantly improves the optical flow estimation performance. Furthermore, to make the prediction more robust and stable, EMA is used for our multi-task training. Experimental results on KITTI datasets show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Human Pose and Action Recognition
