Towards Generalist Robot Learning from Internet Video: A Survey

Robert McCarthy; Daniel C.H. Tan; Dominik Schmidt; Fernando Acero; Nathan Herr; Yilun Du; Thomas G. Thuruthel; Zhibin Li

arXiv:2404.19664·cs.RO·July 24, 2025

Towards Generalist Robot Learning from Internet Video: A Survey

Robert McCarthy, Daniel C.H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, Zhibin Li

PDF

1 Video

TL;DR

This survey reviews the emerging field of learning from internet videos to enhance robot learning, addressing challenges like data scarcity and distribution shift, and highlights future directions for scalable foundation models.

Contribution

It systematically examines LfV concepts, challenges, current methods, and discusses future opportunities for leveraging internet videos in robot learning.

Findings

01

Identifies key challenges such as distribution shift and missing labels.

02

Reviews current methods for extracting knowledge from internet videos.

03

Highlights the importance of scalable foundation models for future progress.

Abstract

Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Generalist Robot Learning from Internet Video: A Survey· underline

Taxonomy

MethodsFocus