Multi-View Fusion for Multi-Level Robotic Scene Understanding
Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan, Birchfield

TL;DR
This paper introduces a multi-level scene understanding system for robots that fuses various perception techniques to enhance manipulation capabilities, including obstacle avoidance and object pose estimation from RGB images.
Contribution
It develops a unified framework combining point cloud generation, primitive shape pose estimation, and full object pose detection from RGB images for robotic scene awareness.
Findings
Effective multi-level scene representation demonstrated
Modules are complementary and improve manipulation tasks
System enhances obstacle avoidance and object localization
Abstract
We present a system for multi-level scene awareness for robotic manipulation. Given a sequence of camera-in-hand RGB images, the system calculates three types of information: 1) a point cloud representation of all the surfaces in the scene, for the purpose of obstacle avoidance; 2) the rough pose of unknown objects from categories corresponding to primitive shapes (e.g., cuboids and cylinders); and 3) full 6-DoF pose of known objects. By developing and fusing recent techniques in these domains, we provide a rich scene representation for robot awareness. We demonstrate the importance of each of these modules, their complementary nature, and the potential benefits of the system in the context of robotic manipulation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Object Detection Techniques · 3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization
