TL;DR
MaskFusion is a real-time RGB-D SLAM system that recognizes, segments, and reconstructs multiple moving objects with semantic labels, enabling dynamic scene understanding and augmented reality applications.
Contribution
It introduces a novel SLAM system that handles multiple moving objects without prior models, integrating semantic segmentation for object-aware mapping in real-time.
Findings
Real-time recognition and reconstruction of multiple moving objects.
Semantic labels are fused into an object-aware map.
System supports augmented reality applications with dynamic scenes.
Abstract
We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera. As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable real-time object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require known models of the objects it can recognize, and can deal with multiple independent motions. MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
