ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video

Han Ling; Yinghui Sun; Quansen Sun; Yuhui Zheng

arXiv:2409.12202·cs.CV·October 15, 2024

ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video

Han Ling, Yinghui Sun, Quansen Sun, Yuhui Zheng

PDF

Open Access 1 Repo

TL;DR

ScaleFlow++ is a novel method for estimating 3D motion from just two RGB images, using cross-scale matching and integrated optical flow and MID estimation, achieving state-of-the-art results and excellent generalization.

Contribution

It introduces cross-scale matching and a unified architecture for optical flow and MID estimation, improving accuracy and robustness over prior methods.

Findings

01

Achieved best monocular scene flow performance on KITTI.

02

Surpassed RGBD-based methods in MID estimation.

03

Exhibited strong zero-shot generalization in various scenes.

Abstract

Perceiving and understanding 3D motion is a core technology in fields such as autonomous driving, robots, and motion prediction. This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize. With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID). Most existing methods directly regress MID from two RGB frames or optical flow, resulting in inaccurate and unstable results. Our key insight is cross-scale matching, which extracts deep motion clues by matching objects in pairs of images at different scales. Unlike previous methods, ScaleFlow++ integrates optical flow and MID estimation into a unified architecture, estimating optical flow and MID end-to-end based on feature matching. Moreover, we also proposed modules such as global initialization network, global iterative optimizer, and hybrid training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HanLingsgjk/CSCV
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Surveillance and Tracking Methods