Efficient PAC Reinforcement Learning in Regular Decision Processes

Alessandro Ronca; Giuseppe De Giacomo

arXiv:2105.06784·cs.AI·May 19, 2022

Efficient PAC Reinforcement Learning in Regular Decision Processes

Alessandro Ronca, Giuseppe De Giacomo

PDF

TL;DR

This paper demonstrates that near-optimal policies in regular decision processes can be PAC-learned efficiently, with the process's complexity captured by a minimal set of parameters, advancing reinforcement learning theory.

Contribution

It introduces a polynomial-time PAC learning algorithm for regular decision processes and shows the parameter set is minimal and effectively measures process complexity.

Findings

01

Near-optimal policies can be PAC-learned in polynomial time.

02

The parameter set effectively captures the complexity of regular decision processes.

03

The approach advances understanding of reinforcement learning in history-dependent environments.

Abstract

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.