Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks for Warehouse Robots
Dinh-Cuong Hoang, Todor Stoyanov, and Achim J. Lilienthal

TL;DR
This paper introduces Object-RPE, a system combining CNNs and dense SLAM to achieve dense 3D scene reconstruction and accurate 6D object pose estimation in warehouse environments, improving robustness and accuracy over existing methods.
Contribution
It presents a novel framework integrating CNNs with dense SLAM for multi-view 6D pose estimation and semantic reconstruction in large environments, outperforming prior single-view approaches.
Findings
Enhanced 3D reconstruction quality demonstrated on datasets.
Improved 6D pose estimation accuracy over state-of-the-art.
Effective multi-view fusion increases robustness of object detection.
Abstract
We present an approach for recognizing all objects in a scene and estimating their full pose from an accurate 3D instance-aware semantic reconstruction using an RGB-D camera. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality object-aware semantic reconstructions of room-sized environments, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
