RAFT-3D: Scene Flow using Rigid-Motion Embeddings
Zachary Teed, Jia Deng

TL;DR
RAFT-3D introduces a novel deep architecture for scene flow estimation that iteratively updates pixelwise 3D motion using rigid-motion embeddings, achieving state-of-the-art accuracy on benchmark datasets.
Contribution
It adapts the RAFT optical flow model for 3D scene flow by incorporating rigid-motion embeddings and Dense-SE3, enabling improved accuracy without object supervision.
Findings
Achieves 83.7% accuracy on FlyingThings3D for d<0.05
Outperforms previous methods on KITTI with an error of 5.77
Demonstrates effectiveness of rigid-motion embeddings in scene flow estimation
Abstract
We address the problem of scene flow: given a pair of stereo or RGB-D video frames, estimate pixelwise 3D motion. We introduce RAFT-3D, a new deep architecture for scene flow. RAFT-3D is based on the RAFT model developed for optical flow but iteratively updates a dense field of pixelwise SE3 motion instead of 2D motion. A key innovation of RAFT-3D is rigid-motion embeddings, which represent a soft grouping of pixels into rigid objects. Integral to rigid-motion embeddings is Dense-SE3, a differentiable layer that enforces geometric consistency of the embeddings. Experiments show that RAFT-3D achieves state-of-the-art performance. On FlyingThings3D, under the two-view evaluation, we improved the best published accuracy (d < 0.05) from 34.3% to 83.7%. On KITTI, we achieve an error of 5.77, outperforming the best published method (6.31), despite using no object instance supervision. Code is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Optical measurement and interference techniques
