Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search
Gal Hadar, Forest Agostinelli, Shahaf S. Shperberg

TL;DR
This paper introduces a reinforcement learning method for heuristics in shortest-path problems that uses limited-horizon search to improve heuristic updates beyond traditional single-step Bellman updates.
Contribution
It proposes a novel approach that performs limited-horizon searches to update heuristics, enhancing the learning process for more accurate estimates.
Findings
Improved heuristic accuracy in shortest-path problems.
Enhanced performance over single-step update methods.
Effective integration of limited-horizon search in reinforcement learning.
Abstract
Many sequential decision-making problems can be formulated as shortest-path problems, where the objective is to reach a goal state from a given starting state. Heuristic search is a standard approach for solving such problems, relying on a heuristic function to estimate the cost to the goal from any given state. Recent approaches leverage reinforcement learning to learn heuristics by applying deep approximate value iteration. These methods typically rely on single-step Bellman updates, where the heuristic of a state is updated based on its best neighbor and the corresponding edge cost. This work proposes a generalized approach that enhances both state sampling and heuristic updates by performing limited-horizon searches and updating each state's heuristic based on the shortest path to the search frontier, incorporating both edge costs and the heuristic values of frontier states.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Constraint Satisfaction and Optimization · Advanced Multi-Objective Optimization Algorithms
