Reinforcement Learning with Goal-Distance Gradient

Kai Jiang; XiaoLong Qin

arXiv:2001.00127·cs.LG·January 13, 2020

Reinforcement Learning with Goal-Distance Gradient

Kai Jiang, XiaoLong Qin

PDF

Open Access

TL;DR

This paper introduces a reward-free reinforcement learning method that uses state transition distances and goal-distance gradients to improve learning in sparse or rewardless environments, enhancing exploration and performance.

Contribution

It presents a novel model-free approach that replaces environmental rewards with state transition distances and introduces a goal-distance gradient for policy improvement.

Findings

01

Outperforms previous methods in sparse reward environments

02

Effectively solves non-reward and complex tasks

03

Improves exploration efficiency through bridge point planning

Abstract

Reinforcement learning usually uses the feedback rewards of environmental to train agents. But the rewards in the actual environment are sparse, and even some environments will not rewards. Most of the current methods are difficult to get good performance in sparse reward or non-reward environments. Although using shaped rewards is effective when solving sparse reward tasks, it is limited to specific problems and learning is also susceptible to local optima. We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment. Our method use the minimum number of transitions between states as the distance to replace the rewards of environmental, and proposes a goal-distance gradient to achieve policy improvement. We also introduce a bridge point planning method based on the characteristics of our method to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Multi-Objective Optimization Algorithms