TL;DR
This paper introduces an object-centric perception method using pretrained visual models and attention mechanisms to improve generalization in robotic manipulation tasks with minimal samples.
Contribution
It presents a novel object-level attention framework that integrates pretrained visual models into robot learning, enabling better generalization and sample efficiency.
Findings
Good generalization across object instances with few samples
Effective learning of diverse manipulation tasks via reinforcement learning
Flexible adjustment of attention scope with different demonstrations
Abstract
Robotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as an object-centric prior for the perception system of a learned policy. We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy. A task-independent meta-attention locates possible objects in the scene, and a task-specific attention identifies which objects are predictive of the trajectories. The scope of the task-specific attention is easily adjusted by showing demonstrations with distractor objects or with diverse relevant objects. Our results indicate that this approach exhibits good generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
