DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks
Tongzhou Mu, Minghua Liu, Hao Su

TL;DR
DrS introduces a data-driven method to learn reusable dense rewards for multi-stage tasks, reducing human effort and improving RL performance across diverse unseen tasks.
Contribution
We propose DrS, a novel approach that learns dense rewards from sparse rewards and demonstrations, enabling reward reuse in unseen multi-stage tasks.
Findings
Reused learned rewards improve RL performance and sample efficiency.
Rewards achieve comparable results to human-engineered rewards.
Method generalizes well across 1000+ task variants.
Abstract
The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given. The learned rewards can be \textit{reused} in unseen tasks, thus reducing the human effort for reward engineering. Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. The learned rewards even achieve comparable performance to human-engineered…
Peer Reviews
Decision·ICLR 2024 poster
* As far as I am aware, the overall design of the algorithm (exploiting stage indicators) as well as the form of the learned reward function are novel. * The method significantly reduces the amount of engineering required to learn reward functions when the task can be broken down into identifiable stages. * DrS handily outperforms several reasonable baselines and competes with the hand-engineered reward in some cases. * In addition to the ablation study in the body of the paper, the appendix inc
* The paper states that demonstrations are optional, but it sounds like they were used in all experiments. I imagine that sample efficiency would deteriorate substantially if no demonstrations are provided and the first stage cannot be solved easily by random exploration, or more generally if any stage cannot be easily solved by a noisy version of a policy that solves the previous stage. * It is not clear that all sparse-reward tasks can be broken up into stages, and as shown in the ablation stu
- This paper makes a contribution towards automating reward design, which is of paramount importance in the field of RL. Having access to dense rewards takes the burden off of exploration, which in turn reduces the number of samples required to solve a task. - The method is a niche application of contrastive discriminator learning, which is well-established in the literature.
- The method requires success and failure trajectories for each stage in the training data, which can be expensive to collect. - The scope of the method is limited to a family of tasks that can be divided into stages. This prevents it from being applied to other tasks such as locomotion. It also means the method is less general compared to LLM-based rewards with external knowledge [1, 2]. - Similarly, the need for stage indicators prevents the method from scaling to real-world problems, which w
- The goal of this work is deriving a dense reward function from an array of training tasks to be repurposed for new, unseen tasks. - The notion of capturing representations for a `task family` is important for enabling RL agents to learn multi-purpose policies. - Operating on the understanding that tasks can be broken down into several segments is also logical. - I like this paper because I think it's important to move away from engineered dense rewards, to more tangible methodologies for lear
- Some ablation study of the robustness of this method against bad demonstrations (i.e. suboptimal, noisy etc) could be nice.
Code & Models
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
