TL;DR
This paper introduces a robotic grasping system using a single external monocular RGB camera, employing neural networks trained on synthetic data to estimate object and robot poses indirectly, enabling flexible camera positioning during grasping.
Contribution
The novel approach combines two neural networks trained on synthetic data for indirect pose estimation, allowing camera movement during operation without affecting grasp quality.
Findings
Effective pose estimation with synthetic data and domain randomization
Robust grasping performance across various camera placements and resolutions
Provision of textured models and pre-trained weights for reproducibility
Abstract
We present a robotic grasping system that uses a single external monocular RGB camera as input. The object-to-robot pose is computed indirectly by combining the output of two neural networks: one that estimates the object-to-camera pose, and another that estimates the robot-to-camera pose. Both networks are trained entirely on synthetic data, relying on domain randomization to bridge the sim-to-real gap. Because the latter network performs online camera calibration, the camera can be moved freely during execution without affecting the quality of the grasp. Experimental results analyze the effect of camera placement, image resolution, and pose refinement in the context of grasping several household objects. We also present results on a new set of 28 textured household toy grocery objects, which have been selected to be accessible to other researchers. To aid reproducibility of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
