TTR-Based Reward for Reinforcement Learning with Implicit Model Priors
Xubo Lyu, Mo Chen

TL;DR
This paper introduces TTR-based reward shaping, inspired by optimal control, to improve data efficiency in model-free reinforcement learning for high-dimensional robotic tasks.
Contribution
It proposes a novel TTR-based reward shaping method that leverages approximate system models to enhance data efficiency without modifying existing RL algorithms.
Findings
Significant improvements in data efficiency observed in robotic tasks.
Compatible with various RL algorithms and easy to integrate.
Effective in high-dimensional state spaces with approximate models.
Abstract
Model-free reinforcement learning (RL) is a powerful approach for learning control policies directly from high-dimensional state and observation. However, it tends to be data-inefficient, which is especially costly in robotic learning tasks. On the other hand, optimal control does not require data if the system model is known, but cannot scale to models with high-dimensional states and observations. To exploit benefits of both model-free RL and optimal control, we propose time-to-reach-based (TTR-based) reward shaping, an optimal control-inspired technique to alleviate data inefficiency while retaining advantages of model-free RL. This is achieved by summarizing key system model information using a TTR function to greatly speed up the RL process, as shown in our simulation results. The TTR function is defined as the minimum time required to move from any state to the goal under assumed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Adaptive Dynamic Programming Control
