Exploring Visual Pre-training for Robot Manipulation: Datasets, Models   and Methods

Ya Jing; Xuelin Zhu; Xingbin Liu; Qie Sima; Taozheng Yang; Yunhai; Feng; Tao Kong

arXiv:2308.03620·cs.RO·August 8, 2023·1 cites

Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

Ya Jing, Xuelin Zhu, Xingbin Liu, Qie Sima, Taozheng Yang, Yunhai, Feng, Tao Kong

PDF

Open Access

TL;DR

This paper investigates the impact of visual pre-training strategies on robot manipulation tasks, exploring datasets, models, and methods, and introduces a new scheme called Vi-PRoM that combines self-supervised and supervised learning for improved robot performance.

Contribution

It provides a comprehensive analysis of visual pre-training effects and proposes Vi-PRoM, a novel scheme integrating contrastive and supervised learning for robot manipulation.

Findings

01

Vi-PRoM outperforms existing methods in simulation and real robot experiments.

02

Contrastive learning effectively captures underlying patterns from unlabeled data.

03

Combining self-supervised and supervised learning enhances robot manipulation capabilities.

Abstract

Visual pre-training with large-scale real-world data has made great progress in recent years, showing great potential in robot learning with pixel observations. However, the recipes of visual pre-training for robot manipulation tasks are yet to be built. In this paper, we thoroughly investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives: pre-training datasets, model architectures and training methods. Several significant experimental findings are provided that are beneficial for robot learning. Further, we propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning. Concretely, the former employs contrastive learning to acquire underlying patterns from large-scale unlabeled data, while the latter aims learning visual semantics and temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsContrastive Learning