TL;DR
This paper introduces the first unsupervised method for learning 3D scene flow from monocular camera images, enabling training on real-world data without ground truth labels, and demonstrates competitive performance on KITTI.
Contribution
It presents a novel unsupervised learning framework for 3D scene flow from monocular images, combining depth, pose, and scene flow estimation with multiple loss functions.
Findings
Achieves effective scene flow estimation without ground truth data.
Outperforms traditional methods like ICP and FGR on KITTI dataset.
Broadens training data scope by using real-world monocular images.
Abstract
Scene flow represents the motion of points in the 3D space, which is the counterpart of the optical flow that represents the motion of pixels in the 2D image. However, it is difficult to obtain the ground truth of scene flow in the real scenes, and recent studies are based on synthetic data for training. Therefore, how to train a scene flow network with unsupervised methods based on real-world data shows crucial significance. A novel unsupervised learning method for scene flow is proposed in this paper, which utilizes the images of two consecutive frames taken by monocular camera without the ground truth of scene flow for training. Our method realizes the goal that training scene flow network with real-world data, which bridges the gap between training data and test data and broadens the scope of available data for training. Unsupervised learning of scene flow in this paper mainly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
