MS-RAFT-3D: A Multi-Scale Architecture for Recurrent Image-Based Scene Flow
Jakob Schmid, Azin Jahedi, Noah Berenguel Senn, Andr\'es Bruhn

TL;DR
This paper introduces MS-RAFT-3D, a multi-scale recurrent neural network architecture for image-based scene flow estimation, which outperforms current state-of-the-art methods on KITTI and Spring datasets.
Contribution
It extends single-scale recurrent scene flow models by incorporating multi-scale hierarchical ideas, improving accuracy in scene flow estimation.
Findings
Outperforms state-of-the-art on KITTI by 8.7%
Achieves 65.8% improvement on Spring dataset
Demonstrates effectiveness of multi-scale approach in scene flow estimation
Abstract
Although multi-scale concepts have recently proven useful for recurrent network architectures in the field of optical flow and stereo, they have not been considered for image-based scene flow so far. Hence, based on a single-scale recurrent scene flow backbone, we develop a multi-scale approach that generalizes successful hierarchical ideas from optical flow to image-based scene flow. By considering suitable concepts for the feature and the context encoder, the overall coarse-to-fine framework and the training loss, we succeed to design a scene flow approach that outperforms the current state of the art on KITTI and Spring by 8.7%(3.89 vs. 4.26) and 65.8% (9.13 vs. 26.71), respectively. Our code is available at https://github.com/cv-stuttgart/MS-RAFT-3D.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Medical Image Segmentation Techniques · Advanced Data Storage Technologies
