RGB-Only Reconstruction of Tabletop Scenes for Collision-Free Manipulator Control
Zhenggang Tang, Balakumar Sundaralingam, Jonathan Tremblay, Bowen Wen,, Ye Yuan, Stephen Tyree, Charles Loop, Alexander Schwing, Stan Birchfield

TL;DR
This paper introduces a system that enables a robot manipulator to navigate and avoid collisions in a tabletop scene using only RGB images, reconstructing 3D geometry with a NeRF-like approach and controlling the robot via model predictive control.
Contribution
The novel approach reconstructs 3D scene geometry from RGB images alone and integrates it with model predictive control for collision-free manipulation.
Findings
Successful 3D reconstruction from RGB images without depth.
Effective collision avoidance in real tabletop scenes.
Real-world dataset demonstrating system performance.
Abstract
We present a system for collision-free control of a robot manipulator that uses only RGB views of the world. Perceptual input of a tabletop scene is provided by multiple images of an RGB camera (without depth) that is either handheld or mounted on the robot end effector. A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function (ESDF) is computed. A model predictive control algorithm is then used to control the manipulator to reach a desired pose while avoiding obstacles in the ESDF. We show results on a real dataset collected and annotated in our lab.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
