Relating Reinforcement Learning to Dynamic Programming-Based Planning

Filip V. Georgiev; Kalle G. Timperi; Ba\c{s}ak Sak\c{c}ak; Steven M. LaValle

arXiv:2603.07844·cs.RO·March 10, 2026

Relating Reinforcement Learning to Dynamic Programming-Based Planning

Filip V. Georgiev, Kalle G. Timperi, Ba\c{s}ak Sak\c{c}ak, Steven M. LaValle

PDF

Open Access

TL;DR

This paper explores the connections between reinforcement learning and dynamic programming-based planning, analyzing their differences, similarities, and conditions for equivalence, and proposes optimizing truecost for better performance.

Contribution

It introduces a derandomized RL version, analyzes conditions for equivalence with planning methods, and advocates for optimizing truecost over arbitrary parameters.

Findings

01

Derandomized RL matches value iteration performance.

02

Conditions identified where cost minimization equals reward maximization.

03

Discounting can prevent goal achievement under certain conditions.

Abstract

This paper bridges some of the gap between optimal planning and reinforcement learning (RL), both of which share roots in dynamic programming applied to sequential decision making or optimal control. Whereas planning typically favors deterministic models, goal termination, and cost minimization, RL tends to favor stochastic models, infinite-horizon discounting, and reward maximization in addition to learning-related parameters such as the learning rate and greediness factor. A derandomized version of RL is developed, analyzed, and implemented to yield performance comparisons with value iteration and Dijkstra's algorithm using simple planning models. Next, mathematical analysis shows: 1) conditions under which cost minimization and reward maximization are equivalent, 2) conditions for equivalence of single-shot goal termination and infinite-horizon episodic learning, and 3) conditions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research