MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion
Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen

TL;DR
This paper introduces MFuseNet, a multiscopic vision system that uses a low-cost monocular camera and learned fusion techniques to achieve accurate depth estimation, outperforming traditional stereo methods.
Contribution
The paper presents a novel multiscopic system with a new heuristic and learning-based fusion method for cost volumes, along with a synthetic dataset for training.
Findings
Outperforms traditional stereo matching in depth accuracy
Effective fusion of multiple cost volumes improves depth estimation
System works well on real-world datasets and robot demonstrations
Abstract
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. To obtain training data, we build a synthetic dataset with multiscopic images. The experiments on the real-world Middlebury dataset and real robot demonstration show that our multiscopic vision system outperforms traditional two-frame stereo matching methods in depth estimation. Our code and dataset are available at https://sites.google.com/view/multiscopic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
