TL;DR
This survey reviews the emerging field of learning from internet videos to enhance robot learning, addressing challenges like data scarcity and distribution shift, and highlights future directions for scalable foundation models.
Contribution
It systematically examines LfV concepts, challenges, current methods, and discusses future opportunities for leveraging internet videos in robot learning.
Findings
Identifies key challenges such as distribution shift and missing labels.
Reviews current methods for extracting knowledge from internet videos.
Highlights the importance of scalable foundation models for future progress.
Abstract
Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsFocus
