Learning to Run with Potential-Based Reward Shaping and Demonstrations   from Video Data

Aleksandra Malysheva; Daniel Kudenko; Aleksei Shpilman

arXiv:2012.08824·cs.LG·December 17, 2020

Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data

Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman

PDF

TL;DR

This paper introduces a method that uses human running videos to shape reward functions in reinforcement learning, significantly improving humanoid robot running speed and overcoming sub-optimal human movement patterns.

Contribution

The paper presents a novel approach combining potential-based reward shaping with video data to enhance reinforcement learning for humanoid locomotion.

Findings

01

Video-based reward shaping doubles running speed in 12 hours of training

02

The approach outperforms the original human videos in running behavior

03

Combines techniques from top NIPS competition approaches

Abstract

Learning to produce efficient movement behaviour for humanoid robots from scratch is a hard problem, as has been illustrated by the "Learning to run" competition at NIPS 2017. The goal of this competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed. All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour. In this paper, we demonstrate how data from videos of human running (e.g. taken from YouTube) can be used to shape the reward of the humanoid learning agent to speed up the learning and produce a better result. Specifically, we are using the positions of key body parts at regular time intervals to define a potential function for potential-based reward shaping (PBRS). Since PBRS does not change the optimal policy, this approach allows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.