TTR-Based Reward for Reinforcement Learning with Implicit Model Priors

Xubo Lyu; Mo Chen

arXiv:1903.09762·cs.RO·October 14, 2020·1 cites

TTR-Based Reward for Reinforcement Learning with Implicit Model Priors

Xubo Lyu, Mo Chen

PDF

Open Access

TL;DR

This paper introduces TTR-based reward shaping, inspired by optimal control, to improve data efficiency in model-free reinforcement learning for high-dimensional robotic tasks.

Contribution

It proposes a novel TTR-based reward shaping method that leverages approximate system models to enhance data efficiency without modifying existing RL algorithms.

Findings

01

Significant improvements in data efficiency observed in robotic tasks.

02

Compatible with various RL algorithms and easy to integrate.

03

Effective in high-dimensional state spaces with approximate models.

Abstract

Model-free reinforcement learning (RL) is a powerful approach for learning control policies directly from high-dimensional state and observation. However, it tends to be data-inefficient, which is especially costly in robotic learning tasks. On the other hand, optimal control does not require data if the system model is known, but cannot scale to models with high-dimensional states and observations. To exploit benefits of both model-free RL and optimal control, we propose time-to-reach-based (TTR-based) reward shaping, an optimal control-inspired technique to alleviate data inefficiency while retaining advantages of model-free RL. This is achieved by summarizing key system model information using a TTR function to greatly speed up the RL process, as shown in our simulation results. The TTR function is defined as the minimum time required to move from any state to the goal under assumed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Adaptive Dynamic Programming Control