Evolving Rewards to Automate Reinforcement Learning
Aleksandra Faust, Anthony Francis, Dar Mehta

TL;DR
AutoRL automates reward tuning in reinforcement learning by using evolutionary algorithms to optimize rewards, leading to improved performance on complex continuous control tasks without manual reward engineering.
Contribution
The paper introduces AutoRL, an evolutionary framework that automates reward search in RL, reducing manual tuning and improving performance on complex tasks.
Findings
AutoRL outperforms baseline methods on Mujoco tasks.
The biggest improvements occur on more complex control tasks.
AutoRL reduces the need for hand-crafted reward functions.
Abstract
Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: \url{https://youtu.be/svdaOFfQyC8}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning
