Memoryless Exact Solutions for Deterministic MDPs with Sparse Rewards
Joshua R. Bertram, Peng Wei

TL;DR
This paper introduces an exact, memory-efficient algorithm for deterministic MDPs with sparse rewards that computes optimal policies without depending on the size of the state space, enabling scalable decision-making.
Contribution
The paper presents a novel algorithm that computes exact solutions for deterministic sparse-reward MDPs with complexity independent of state space size, and a method to follow policies on-demand.
Findings
Algorithm computes optimal policies with complexity depending only on reward sources and actions.
Demonstrated efficiency and accuracy compared to value iteration on tractable MDPs.
Memory and time complexity are independent of the total number of states.
Abstract
We propose an algorithm for deterministic continuous Markov Decision Processes with sparse rewards that computes the optimal policy exactly with no dependency on the size of the state space. The algorithm has time complexity of and memory complexity of , where is the number of reward sources and is the number of actions. Furthermore, we describe a companion algorithm that can follow the optimal policy from any initial state without computing the entire value function, instead computing on-demand the value of states as they are needed. The algorithm to solve the MDP does not depend on the size of the state space for either time or memory complexity, and the ability to follow the optimal policy is linear in time and space with the path length of following the optimal policy from the initial state. We demonstrate the algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Machine Learning and Algorithms
