TL;DR
This paper presents a novel method to generate large-scale, accurate optical flow training data from single images using monocular depth estimation and virtual camera movements, improving real-world generalization.
Contribution
The authors introduce a framework that creates ground-truth optical flow annotations from single images, enhancing training data availability and model generalization.
Findings
Models trained with generated data outperform those trained on synthetic datasets.
Generated data improves generalization to unseen real-world data.
Combining generated data with synthetic images yields better results.
Abstract
This paper deals with the scarcity of data for training optical flow networks, highlighting the limitations of existing sources such as labeled synthetic datasets or unlabeled real videos. Specifically, we introduce a framework to generate accurate ground-truth optical flow annotations quickly and in large amounts from any readily available single real picture. Given an image, we use an off-the-shelf monocular depth estimation network to build a plausible point cloud for the observed scene. Then, we virtually move the camera in the reconstructed environment with known motion vectors and rotation angles, allowing us to synthesize both a novel view and the corresponding optical flow field connecting each pixel in the input image to the one in the new frame. When trained with our data, state-of-the-art optical flow networks achieve superior generalization to unseen real data compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
