ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video
Han Ling, Yinghui Sun, Quansen Sun, Yuhui Zheng

TL;DR
ScaleFlow++ is a novel method for estimating 3D motion from just two RGB images, using cross-scale matching and integrated optical flow and MID estimation, achieving state-of-the-art results and excellent generalization.
Contribution
It introduces cross-scale matching and a unified architecture for optical flow and MID estimation, improving accuracy and robustness over prior methods.
Findings
Achieved best monocular scene flow performance on KITTI.
Surpassed RGBD-based methods in MID estimation.
Exhibited strong zero-shot generalization in various scenes.
Abstract
Perceiving and understanding 3D motion is a core technology in fields such as autonomous driving, robots, and motion prediction. This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize. With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID). Most existing methods directly regress MID from two RGB frames or optical flow, resulting in inaccurate and unstable results. Our key insight is cross-scale matching, which extracts deep motion clues by matching objects in pairs of images at different scales. Unlike previous methods, ScaleFlow++ integrates optical flow and MID estimation into a unified architecture, estimating optical flow and MID end-to-end based on feature matching. Moreover, we also proposed modules such as global initialization network, global iterative optimizer, and hybrid training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
