Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Ya Jing, Xuelin Zhu, Xingbin Liu, Qie Sima, Taozheng Yang, Yunhai, Feng, Tao Kong

TL;DR
This paper investigates the impact of visual pre-training strategies on robot manipulation tasks, exploring datasets, models, and methods, and introduces a new scheme called Vi-PRoM that combines self-supervised and supervised learning for improved robot performance.
Contribution
It provides a comprehensive analysis of visual pre-training effects and proposes Vi-PRoM, a novel scheme integrating contrastive and supervised learning for robot manipulation.
Findings
Vi-PRoM outperforms existing methods in simulation and real robot experiments.
Contrastive learning effectively captures underlying patterns from unlabeled data.
Combining self-supervised and supervised learning enhances robot manipulation capabilities.
Abstract
Visual pre-training with large-scale real-world data has made great progress in recent years, showing great potential in robot learning with pixel observations. However, the recipes of visual pre-training for robot manipulation tasks are yet to be built. In this paper, we thoroughly investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives: pre-training datasets, model architectures and training methods. Several significant experimental findings are provided that are beneficial for robot learning. Further, we propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning. Concretely, the former employs contrastive learning to acquire underlying patterns from large-scale unlabeled data, while the latter aims learning visual semantics and temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsContrastive Learning
