Evolving Rewards to Automate Reinforcement Learning

Aleksandra Faust; Anthony Francis; Dar Mehta

arXiv:1905.07628·cs.LG·May 21, 2019·22 cites

Evolving Rewards to Automate Reinforcement Learning

Aleksandra Faust, Anthony Francis, Dar Mehta

PDF

Open Access

TL;DR

AutoRL automates reward tuning in reinforcement learning by using evolutionary algorithms to optimize rewards, leading to improved performance on complex continuous control tasks without manual reward engineering.

Contribution

The paper introduces AutoRL, an evolutionary framework that automates reward search in RL, reducing manual tuning and improving performance on complex tasks.

Findings

01

AutoRL outperforms baseline methods on Mujoco tasks.

02

The biggest improvements occur on more complex control tasks.

03

AutoRL reduces the need for hand-crafted reward functions.

Abstract

Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: \url{https://youtu.be/svdaOFfQyC8}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning