Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations
Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis, Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta, Dwibedi

TL;DR
This paper demonstrates that object-aware self-supervised representations significantly improve robotic control and object localization in multi-object scenes, especially in low-data scenarios, compared to object-agnostic methods.
Contribution
It introduces an object-aware self-supervised learning approach for robotic tasks, enhancing control and localization performance over existing object-agnostic techniques.
Findings
20% performance increase in low-data policy training
Outperforms object-agnostic methods in scene understanding
Effective in multi-object scene control and localization
Abstract
Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large labeled datasets for each task that are expensive to collect in the real world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
