Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining
Qihang Zhang, Zhenghao Peng, Bolei Zhou

TL;DR
This paper introduces a novel contrastive pretraining method for visuomotor policies in driving tasks, leveraging uncurated YouTube videos to improve reinforcement and imitation learning performance.
Contribution
It proposes a new action-conditioned contrastive pretraining approach using pseudo labels from videos, enhancing policy learning without extensive environment interactions.
Findings
Outperforms previous unsupervised and ImageNet pretraining methods
Significantly improves downstream reinforcement learning results
Effective in imitation learning scenarios
Abstract
Deep visuomotor policy learning, which aims to map raw visual observation to action, achieves promising results in control tasks such as robotic manipulation and autonomous driving. However, it requires a huge number of online interactions with the training environment, which limits its real-world application. Compared to the popular unsupervised feature learning for visual recognition, feature pretraining for visuomotor control tasks is much less explored. In this work, we aim to pretrain policy representations for driving tasks by watching hours-long uncurated YouTube videos. Specifically, we train an inverse dynamic model with a small amount of labeled data and use it to predict action labels for all the YouTube video frames. A new contrastive policy pretraining method is then developed to learn action-conditioned features from the video frames with pseudo action labels. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Technologies in Various Fields
