FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey, Dosovitskiy, Thomas Brox

TL;DR
FlowNet 2.0 significantly improves deep learning-based optical flow estimation by enhancing training strategies, architecture, and small displacement handling, achieving state-of-the-art accuracy at real-time speeds.
Contribution
We introduce a stacked architecture with warping and a specialized small displacement sub-network, along with optimized training data scheduling, to greatly enhance optical flow estimation quality.
Findings
Reduces estimation error by over 50%
Achieves real-time processing at up to 140fps
Performs on par with traditional state-of-the-art methods
Abstract
The FlowNet demonstrated that optical flow estimation can be cast as a learning problem. However, the state of the art with regard to the quality of the flow has still been defined by traditional methods. Particularly on small displacements and real-world data, FlowNet cannot compete with variational methods. In this paper, we advance the concept of end-to-end learning of optical flow and make it work really well. The large improvements in quality and speed are caused by three major contributions: first, we focus on the training data and show that the schedule of presenting data during training is very important. Second, we develop a stacked architecture that includes warping of the second image with intermediate optical flow. Third, we elaborate on small displacements by introducing a sub-network specializing on small motions. FlowNet 2.0 is only marginally slower than the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
