Reward Shaping with Dynamic Trajectory Aggregation
Takato Okudo, Seiji Yamada

TL;DR
This paper introduces a dynamic trajectory aggregation method using subgoal series for reward shaping in reinforcement learning, significantly improving learning efficiency in complex environments with minimal designer effort.
Contribution
It proposes a novel trajectory aggregation technique based on subgoal series that simplifies reward shaping in high-dimensional environments, overcoming limitations of existing methods.
Findings
Outperformed baseline RL algorithms in three diverse domains.
Reduced designer effort in defining state aggregations.
Enhanced learning efficiency in environments with high-dimensional observations.
Abstract
Reinforcement learning, which acquires a policy maximizing long-term rewards, has been actively studied. Unfortunately, this learning type is too slow and difficult to use in practical situations because the state-action space becomes huge in real environments. The essential factor for learning efficiency is rewards. Potential-based reward shaping is a basic method for enriching rewards. This method is required to define a specific real-value function called a potential function for every domain. It is often difficult to represent the potential function directly. SARSA-RS learns the potential function and acquires it. However, SARSA-RS can only be applied to the simple environment. The bottleneck of this method is the aggregation of states to make abstract states since it is almost impossible for designers to build an aggregation function for all states. We propose a trajectory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Evolutionary Algorithms and Applications
