Regular Decision Processes for Grid Worlds
Nicky Lenaers, Martijn van Otterlo

TL;DR
This paper explores regular decision processes that handle non-Markovian rewards and transitions, providing a tool chain, algorithms, and experiments in grid worlds to advance decision-making under complex temporal dependencies.
Contribution
It introduces a comprehensive tool chain and algorithmic extensions for regular decision processes, enabling effective learning in non-Markovian grid world environments.
Findings
Model-free and model-based algorithms evaluated
Successful application to non-Markovian grid worlds
Enhanced decision-making under complex temporal dependencies
Abstract
Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
