Memoryless Exact Solutions for Deterministic MDPs with Sparse Rewards

Joshua R. Bertram; Peng Wei

arXiv:1805.07220·cs.LG·May 21, 2018

Memoryless Exact Solutions for Deterministic MDPs with Sparse Rewards

Joshua R. Bertram, Peng Wei

PDF

Open Access

TL;DR

This paper introduces an exact, memory-efficient algorithm for deterministic MDPs with sparse rewards that computes optimal policies without depending on the size of the state space, enabling scalable decision-making.

Contribution

The paper presents a novel algorithm that computes exact solutions for deterministic sparse-reward MDPs with complexity independent of state space size, and a method to follow policies on-demand.

Findings

01

Algorithm computes optimal policies with complexity depending only on reward sources and actions.

02

Demonstrated efficiency and accuracy compared to value iteration on tractable MDPs.

03

Memory and time complexity are independent of the total number of states.

Abstract

We propose an algorithm for deterministic continuous Markov Decision Processes with sparse rewards that computes the optimal policy exactly with no dependency on the size of the state space. The algorithm has time complexity of $O (∣ R ∣^{3} \times ∣ A ∣^{2})$ and memory complexity of $O (∣ R ∣ \times ∣ A ∣)$ , where $∣ R ∣$ is the number of reward sources and $∣ A ∣$ is the number of actions. Furthermore, we describe a companion algorithm that can follow the optimal policy from any initial state without computing the entire value function, instead computing on-demand the value of states as they are needed. The algorithm to solve the MDP does not depend on the size of the state space for either time or memory complexity, and the ability to follow the optimal policy is linear in time and space with the path length of following the optimal policy from the initial state. We demonstrate the algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Machine Learning and Algorithms