ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video
Han Ling, Quansen Sun

TL;DR
ScaleFlow++ introduces a robust, end-to-end method for estimating 3D motion from just two RGB images, leveraging cross-scale matching to improve accuracy and generalization in various scenes.
Contribution
It proposes a novel cross-scale matching approach and an integrated architecture for joint optical flow and motion-in-depth estimation from monocular images.
Findings
Achieved state-of-the-art performance on KITTI dataset
Surpassed RGBD methods in motion-in-depth estimation
Exhibited excellent zero-shot generalization in diverse scenes
Abstract
Perceiving and understanding 3D motion is a core technology in fields such as autonomous driving, robots, and motion prediction. This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize. With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID). Most existing methods directly regress MID from two RGB frames or optical flow, resulting in inaccurate and unstable results. Our key insight is cross-scale matching, which extracts deep motion clues by matching objects in pairs of images at different scales. Unlike previous methods, ScaleFlow++ integrates optical flow and MID estimation into a unified architecture, estimating optical flow and MID end-to-end based on feature matching. Moreover, we also proposed modules such as global initialization network, global iterative optimizer, and hybrid training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Image and Video Stabilization
