Skills Made to Order: Efficient Acquisition of Robot Cooking Skills Guided by Multiple Forms of Internet Data
Mrinal Verghese, Christopher Atkeson

TL;DR
This paper investigates leveraging diverse internet data sources and models to efficiently select robot behaviors for contact-rich skills like cooking, demonstrating significant improvements in success rates.
Contribution
It introduces a multi-modal internet data-based template selection method that outperforms existing approaches in robot skill acquisition.
Findings
LLMs effectively select templates despite lacking visual data
Optic flow features outperform video encoder features in template selection
Combining multiple internet data sources yields a 79% success rate in cooking skills
Abstract
This study explores the utility of various internet data sources to select among a set of template robot behaviors to perform skills. Learning contact-rich skills involving tool use from internet data sources has typically been challenging due to the lack of physical information such as contact existence, location, areas, and force in this data. Prior works have generally used internet data and foundation models trained on this data to generate low-level robot behavior. We hypothesize that these data and models may be better suited to selecting among a set of basic robot behaviors to perform these contact-rich skills. We explore three methods of template selection: querying large language models, comparing video of robot execution to retrieved human video using features from a pretrained video encoder common in prior work, and performing the same comparison using features from an optic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
MethodsSparse Evolutionary Training
