Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data
Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman

TL;DR
This paper introduces a method that uses human running videos to shape reward functions in reinforcement learning, significantly improving humanoid robot running speed and overcoming sub-optimal human movement patterns.
Contribution
The paper presents a novel approach combining potential-based reward shaping with video data to enhance reinforcement learning for humanoid locomotion.
Findings
Video-based reward shaping doubles running speed in 12 hours of training
The approach outperforms the original human videos in running behavior
Combines techniques from top NIPS competition approaches
Abstract
Learning to produce efficient movement behaviour for humanoid robots from scratch is a hard problem, as has been illustrated by the "Learning to run" competition at NIPS 2017. The goal of this competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed. All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour. In this paper, we demonstrate how data from videos of human running (e.g. taken from YouTube) can be used to shape the reward of the humanoid learning agent to speed up the learning and produce a better result. Specifically, we are using the positions of key body parts at regular time intervals to define a potential function for potential-based reward shaping (PBRS). Since PBRS does not change the optimal policy, this approach allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
