SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained   Networks

Xingyu Lin; John So; Sashwat Mahalingam; Fangchen Liu; Pieter Abbeel

arXiv:2307.03567·cs.RO·October 24, 2023·1 cites

SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel

PDF

Open Access

TL;DR

SpawnNet introduces a two-stream architecture that fuses pre-trained visual features to enhance the generalization of visuomotor policies across categories, outperforming prior methods in imitation learning tasks.

Contribution

The paper proposes SpawnNet, a novel architecture that improves categorical generalization of policies by learning to fuse multi-layer pre-trained visual representations.

Findings

01

Significantly better categorical generalization in simulated experiments.

02

Effective transfer to real-world scenarios.

03

Outperforms prior approaches in imitation learning.

Abstract

The existing internet-scale image and video datasets cover a wide range of everyday objects and tasks, bringing the potential of learning policies that generalize in diverse scenarios. Prior works have explored visual pre-training with different self-supervised objectives. Still, the generalization capabilities of the learned policies and the advantages over well-tuned baselines remain unclear from prior studies. In this work, we present a focused study of the generalization capabilities of the pre-trained visual representations at the categorical level. We identify the key bottleneck in using a frozen pre-trained visual backbone for policy learning and then propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy. Through extensive simulated and real experiments, we show significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition