RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation
Boshi An, Yiran Geng, Kai Chen, Xiaoqi Li, Qi Dou, Hao Dong

TL;DR
This paper introduces an image-only robotic manipulation framework using an active monocular camera to estimate object poses, balancing perception accuracy and manipulation efficiency with reinforcement learning, achieving state-of-the-art results.
Contribution
The novel framework enables effective 6D object pose estimation and manipulation using only RGB images with active perception and reinforcement learning, eliminating the need for point-cloud data.
Findings
Achieves state-of-the-art manipulation accuracy in both simulation and real-world tests.
Effectively balances pose estimation accuracy and manipulation speed.
Demonstrates robustness of the approach in diverse environments.
Abstract
Robotic manipulation requires accurate perception of the environment, which poses a significant challenge due to its inherent complexity and constantly changing nature. In this context, RGB image and point-cloud observations are two commonly used modalities in visual-based robotic manipulation, but each of these modalities have their own limitations. Commercial point-cloud observations often suffer from issues like sparse sampling and noisy output due to the limits of the emission-reception imaging principle. On the other hand, RGB images, while rich in texture information, lack essential depth and 3D information crucial for robotic manipulation. To mitigate these challenges, we propose an image-only robotic manipulation framework that leverages an eye-on-hand monocular camera installed on the robot's parallel gripper. By moving with the robot gripper, this camera gains the ability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Visual Attention and Saliency Detection
