Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards
Joshua R. Bertram, Xuxi Yang, and Peng Wei

TL;DR
This paper presents a novel exact and efficient algorithm for solving deterministic, continuous MDPs with sparse rewards, significantly reducing computation time compared to classical methods, especially in environments like grid worlds.
Contribution
The paper introduces a new algorithm that solves deterministic, sparse-reward MDPs exactly with improved computational efficiency and lower memory requirements.
Findings
Algorithm achieves $O( |R|^2 imes |A|^2 imes |S|)$ time complexity.
Memory complexity is $O( |S| + |R| imes |A|)$.
Numerical experiments demonstrate superior computational performance.
Abstract
Markov Decision Processes (MDPs) are a mathematical framework for modeling sequential decision making under uncertainty. The classical approaches for solving MDPs are well known and have been widely studied, some of which rely on approximation techniques to solve MDPs with large state space and/or action space. However, most of these classical solution approaches and their approximation techniques still take much computation time to converge and usually must be re-computed if the reward function is changed. This paper introduces a novel alternative approach for exactly and efficiently solving deterministic, continuous MDPs with sparse reward sources. When the environment is such that the "distance" between states can be determined in constant time, e.g. grid world, our algorithm offers , where is the number of reward sources, is the number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems
