Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models
Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman

TL;DR
POCR is a framework that combines pre-trained models to create object-centric representations for robotics, improving control and generalization without additional training.
Contribution
It introduces a novel method to leverage existing pre-trained models for building object-centric representations in robotics, enhancing performance and generalization.
Findings
Outperforms state-of-the-art pre-trained representations in robotic tasks
Enables systematic generalization in robotic manipulation
No additional training needed for the object-centric representations
Abstract
There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose , a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing "where" information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing "what" the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Robotics and Automated Systems · Modular Robots and Swarm Intelligence
