Reinforcement Learning with Goal-Distance Gradient
Kai Jiang, XiaoLong Qin

TL;DR
This paper introduces a reward-free reinforcement learning method that uses state transition distances and goal-distance gradients to improve learning in sparse or rewardless environments, enhancing exploration and performance.
Contribution
It presents a novel model-free approach that replaces environmental rewards with state transition distances and introduces a goal-distance gradient for policy improvement.
Findings
Outperforms previous methods in sparse reward environments
Effectively solves non-reward and complex tasks
Improves exploration efficiency through bridge point planning
Abstract
Reinforcement learning usually uses the feedback rewards of environmental to train agents. But the rewards in the actual environment are sparse, and even some environments will not rewards. Most of the current methods are difficult to get good performance in sparse reward or non-reward environments. Although using shaped rewards is effective when solving sparse reward tasks, it is limited to specific problems and learning is also susceptible to local optima. We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment. Our method use the minimum number of transitions between states as the distance to replace the rewards of environmental, and proposes a goal-distance gradient to achieve policy improvement. We also introduce a bridge point planning method based on the characteristics of our method to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Multi-Objective Optimization Algorithms
