Reward Shaping with Dynamic Trajectory Aggregation

Takato Okudo; Seiji Yamada

arXiv:2104.06163·cs.LG·April 14, 2021

Reward Shaping with Dynamic Trajectory Aggregation

Takato Okudo, Seiji Yamada

PDF

Open Access

TL;DR

This paper introduces a dynamic trajectory aggregation method using subgoal series for reward shaping in reinforcement learning, significantly improving learning efficiency in complex environments with minimal designer effort.

Contribution

It proposes a novel trajectory aggregation technique based on subgoal series that simplifies reward shaping in high-dimensional environments, overcoming limitations of existing methods.

Findings

01

Outperformed baseline RL algorithms in three diverse domains.

02

Reduced designer effort in defining state aggregations.

03

Enhanced learning efficiency in environments with high-dimensional observations.

Abstract

Reinforcement learning, which acquires a policy maximizing long-term rewards, has been actively studied. Unfortunately, this learning type is too slow and difficult to use in practical situations because the state-action space becomes huge in real environments. The essential factor for learning efficiency is rewards. Potential-based reward shaping is a basic method for enriching rewards. This method is required to define a specific real-value function called a potential function for every domain. It is often difficult to represent the potential function directly. SARSA-RS learns the potential function and acquires it. However, SARSA-RS can only be applied to the simple environment. The bottleneck of this method is the aggregation of states to make abstract states since it is almost impossible for designers to build an aggregation function for all states. We propose a trajectory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Evolutionary Algorithms and Applications