GMA3D: Local-Global Attention Learning to Estimate Occluded Motions of Scene Flow
Zhiyang Lu, Ming Cheng

TL;DR
GMA3D introduces a transformer-based module that leverages local and global semantic similarities to effectively estimate scene flow in occluded 3D point clouds, improving accuracy especially in real-world scenarios.
Contribution
This paper is the first to apply transformer architecture to address occlusion in scene flow estimation for point clouds, utilizing semantic self-similarity and motion consistency.
Findings
Achieved state-of-the-art results on the KITTI dataset for occluded scene flow.
Demonstrated effectiveness of GMA3D on non-occluded datasets like FlyThings3D.
Improved scene flow estimation accuracy in real-world occlusion scenarios.
Abstract
Scene flow represents the motion information of each point in the 3D point clouds. It is a vital downstream method applied to many tasks, such as motion segmentation and object tracking. However, there are always occlusion points between two consecutive point clouds, whether from the sparsity data sampling or real-world occlusion. In this paper, we focus on addressing occlusion issues in scene flow by the semantic self-similarity and motion consistency of the moving objects. We propose a GMA3D module based on the transformer framework, which utilizes local and global semantic similarity to infer the motion information of occluded points from the motion information of local and global non-occluded points respectively, and then uses an offset aggregator to aggregate them. Our module is the first to apply the transformer-based architecture to gauge the scene flow occlusion problem on point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Human Motion and Animation
