Attention-Based Reward Shaping for Sparse and Delayed Rewards
Ian Holmes, Min Chi

TL;DR
This paper introduces ARES, an attention-based reward shaping method that transforms sparse, delayed rewards into dense signals, significantly improving reinforcement learning performance in challenging environments.
Contribution
ARES is the first fully offline, robust reward shaping algorithm using transformers, effective with minimal data and applicable across various environments and RL algorithms.
Findings
ARES improves learning in delayed reward scenarios
It works effectively with small datasets and random agent episodes
Significantly enhances training efficiency in sparse reward environments
Abstract
Sparse and delayed reward functions pose a significant obstacle for real-world Reinforcement Learning (RL) applications. In this work, we propose Attention-based REward Shaping (ARES), a general and robust algorithm which uses a transformer's attention mechanism to generate shaped rewards and create a dense reward function for any environment. ARES requires a set of episodes and their final returns as input. It can be trained entirely offline and is able to generate meaningful shaped rewards even when using small datasets or episodes produced by agents taking random actions. ARES is compatible with any RL algorithm and can handle any level of reward sparsity. In our experiments, we focus on the most challenging case where rewards are fully delayed until the end of each episode. We evaluate ARES across a diverse range of environments, widely used RL algorithms, and baseline methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Emotion and Mood Recognition
MethodsSoftmax · Attention Is All You Need · Focus · Sparse Evolutionary Training
