Playful Interactions for Representation Learning
Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto

TL;DR
This paper introduces a self-supervised approach using playful interactions to learn visual representations, reducing the need for task-specific demonstrations in imitation learning for tasks like pushing and stacking.
Contribution
It proposes leveraging diverse, task-agnostic play data for self-supervised visual representation learning, improving efficiency and generalization in downstream imitation tasks.
Findings
Play data is diverse and task-agnostic, aiding representation learning.
Our method achieves similar performance with half the demonstrations compared to behavior cloning.
Representations trained from scratch outperform ImageNet pretrained features.
Abstract
One of the key challenges in visual imitation learning is collecting large amounts of expert demonstrations for a given task. While methods for collecting human demonstrations are becoming easier with teleoperation methods and the use of low-cost assistive tools, we often still require 100-1000 demonstrations for every task to learn a visual representation and policy. To address this, we turn to an alternate form of data that does not require task-specific demonstrations -- play. Playing is a fundamental method children use to learn a set of skills and behaviors and visual representations in early learning. Importantly, play data is diverse, task-agnostic, and relatively cheap to obtain. In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks. We collect 2 hours of playful data in 19 diverse environments and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
