Tractable Offline Learning of Regular Decision Processes
Ahana Deb, Roberto Cipollone, Anders Jonsson, Alessandro Ronca,, Mohammad Sadegh Talebi

TL;DR
This paper advances offline reinforcement learning in non-Markovian environments called Regular Decision Processes by introducing language-based metrics and memory-efficient counting, reducing sample complexity and memory use.
Contribution
It introduces a novel language-based pseudometric and uses Count-Min-Sketch for memory efficiency, overcoming limitations of previous RDP algorithms.
Findings
Reduced sample complexity in low language complexity environments.
Memory-efficient approach with Count-Min-Sketch.
Validated improvements through experiments.
Abstract
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs). In RDPs, the unknown dependency of future observations and rewards from the past interactions can be captured by some hidden finite-state automaton. For this reason, many RDP algorithms first reconstruct this unknown dependency using automata learning techniques. In this paper, we show that it is possible to overcome two strong limitations of previous offline RL algorithms for RDPs, notably RegORL. This can be accomplished via the introduction of two original techniques: the development of a new pseudometric based on formal languages, which removes a problematic dependency on -distinguishability parameters, and the adoption of Count-Min-Sketch (CMS), instead of naive counting. The former reduces the number of samples required in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic
