The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta

TL;DR
Pre-trained visual representations from large-scale datasets can effectively replace ground-truth states for training control policies across various domains, challenging the traditional reliance on in-domain data.
Contribution
This paper demonstrates that off-the-shelf pre-trained vision models can be competitive or superior to ground-truth states for control, with extensive empirical evaluation across multiple environments.
Findings
Pre-trained visual features often outperform ground-truth states in control tasks.
Out-of-domain vision data can be effectively used for control policy training.
Representation training methods and data augmentations significantly impact performance.
Abstract
Recent years have seen the emergence of pre-trained representations as a powerful abstraction for AI applications in computer vision, natural language, and speech. However, policy learning for control is still dominated by a tabula-rasa learning paradigm, with visuo-motor policies often trained from scratch using data from deployment environments. In this context, we revisit and study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets. Through extensive empirical evaluation in diverse control domains (Habitat, DeepMind Control, Adroit, Franka Kitchen), we isolate and study the importance of different representation training methods, data augmentations, and feature hierarchies. Overall, we find that pre-trained visual representations can be competitive or even better than ground-truth state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
