Simplifying Reward Design through Divide-and-Conquer
Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan

TL;DR
This paper presents a divide-and-conquer method for reward design in reinforcement learning that simplifies the process by allowing separate reward specification per environment and inferring a common reward, improving efficiency and solution quality.
Contribution
The paper introduces a novel approach to reward design that treats environment-specific rewards as observations to infer a unified reward, reducing complexity and user effort.
Findings
Our method is faster and easier to use than joint reward design.
It produces higher quality solutions in user studies.
Independent reward design performs best when the problem can be divided into simpler subproblems.
Abstract
Designing a good reward function is essential to robot planning and reinforcement learning, but it can also be challenging and frustrating. The reward needs to work across multiple different environments, and that often requires many iterations of tuning. We introduce a novel divide-and-conquer approach that enables the designer to specify a reward separately for each environment. By treating these separate reward functions as observations about the underlying true reward, we derive an approach to infer a common reward across all environments. We conduct user studies in an abstract grid world domain and in a motion planning domain for a 7-DOF manipulator that measure user effort and solution quality. We show that our method is faster, easier to use, and produces a higher quality solution than the typical method of designing a reward jointly across all environments. We additionally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
