Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

Joshua R. Bertram; Xuxi Yang; and Peng Wei

arXiv:1805.02785·cs.LG·May 18, 2018·1 cites

Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

Joshua R. Bertram, Xuxi Yang, and Peng Wei

PDF

Open Access

TL;DR

This paper presents a novel exact and efficient algorithm for solving deterministic, continuous MDPs with sparse rewards, significantly reducing computation time compared to classical methods, especially in environments like grid worlds.

Contribution

The paper introduces a new algorithm that solves deterministic, sparse-reward MDPs exactly with improved computational efficiency and lower memory requirements.

Findings

01

Algorithm achieves $O( |R|^2 imes |A|^2 imes |S|)$ time complexity.

02

Memory complexity is $O( |S| + |R| imes |A|)$.

03

Numerical experiments demonstrate superior computational performance.

Abstract

Markov Decision Processes (MDPs) are a mathematical framework for modeling sequential decision making under uncertainty. The classical approaches for solving MDPs are well known and have been widely studied, some of which rely on approximation techniques to solve MDPs with large state space and/or action space. However, most of these classical solution approaches and their approximation techniques still take much computation time to converge and usually must be re-computed if the reward function is changed. This paper introduces a novel alternative approach for exactly and efficiently solving deterministic, continuous MDPs with sparse reward sources. When the environment is such that the "distance" between states can be determined in constant time, e.g. grid world, our algorithm offers $O (∣ R ∣^{2} \times ∣ A ∣^{2} \times ∣ S ∣)$ , where $∣ R ∣$ is the number of reward sources, $∣ A ∣$ is the number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems