Grasp, See, and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior
Kechun Xu, Zhongxiang Zhou, Jun Wu, Haojian Lu, Rong Xiong, Yue Wang

TL;DR
This paper introduces GSP, a dual-loop system for unknown object rearrangement that leverages a decoupled perception structure and foundation models to improve task success amidst perception noise.
Contribution
It presents a novel decoupled structure for perception in rearrangement tasks and a dual-loop system utilizing foundation models for improved performance.
Findings
Higher completion rates in rearrangement tasks
Fewer steps required for successful rearrangement
Robustness to perception noise demonstrated
Abstract
We focus on the task of unknown object rearrangement, where a robot is supposed to re-configure the objects into a desired goal configuration specified by an RGB-D image. Recent works explore unknown object rearrangement systems by incorporating learning-based perception modules. However, they are sensitive to perception error, and pay less attention to task-level performance. In this paper, we aim to develop an effective system for unknown object rearrangement amidst perception noise. We theoretically reveal that the noisy perception impacts grasp and place in a decoupled way, and show such a decoupled structure is valuable to improve task optimality. We propose GSP, a dual-loop system with the decoupled structure as prior. For the inner loop, we learn a see policy for self-confident in-hand object matching. For the outer loop, we learn a grasp policy aware of object matching and grasp…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Organ Donation and Transplantation
MethodsFocus · Contrastive Language-Image Pre-training · Attentive Walk-Aggregating Graph Neural Network
