PianoMime: Learning a Generalist, Dexterous Piano Player from Internet   Demonstrations

Cheng Qian; Julen Urain; Kevin Zakka; Jan Peters

arXiv:2407.18178·cs.CV·July 26, 2024

PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

Cheng Qian, Julen Urain, Kevin Zakka, Jan Peters

PDF

TL;DR

PianoMime is a framework that leverages internet videos to train a generalist, dexterous piano-playing agent capable of performing any song, demonstrating promising generalization to unseen compositions.

Contribution

This work introduces a novel approach to train a generalist piano-playing agent using internet demonstrations, combining data extraction, policy learning, and distillation.

Findings

01

Achieved up to 56% F1 score on unseen songs.

02

Demonstrated the effectiveness of internet videos for training generalist agents.

03

Explored different policy designs and data amounts for improved generalization.

Abstract

In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.