TL;DR
RAFT-MSF++ is a self-supervised multi-frame method that fuses temporal features to improve monocular scene flow estimation, especially in occluded regions, by using a novel Geometry-Motion Feature and attention mechanisms.
Contribution
It introduces a recurrent fusion framework with Geometry-Motion Features and occlusion-aware modules for enhanced temporal reasoning in monocular scene flow estimation.
Findings
Achieves 24.14% SF-all on KITTI Scene Flow benchmark.
30.99% improvement over the baseline.
Demonstrates robustness in occluded regions.
Abstract
Monocular scene flow estimation aims to recover dense 3D motion from image sequences, yet most existing methods are limited to two-frame inputs, restricting temporal modeling and robustness to occlusions. We propose RAFT-MSF++, a self-supervised multi-frame framework that recurrently fuses temporal features to jointly estimate depth and scene flow. Central to our approach is the Geometry-Motion Feature (GMF), which compactly encodes coupled motion and geometry cues and is iteratively updated for effective temporal reasoning. To ensure the robustness of this temporal fusion against occlusions, we incorporate relative positional attention to inject spatial priors and an occlusion regularization module to propagate reliable motion from visible regions. These components enable the GMF to effectively propagate information even in ambiguous areas. Extensive experiments show that RAFT-MSF++…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
