Decision Making Agent Searching for Markov Models in Near-Deterministic World
Gabor Matuz, Andras Lorincz

TL;DR
This paper introduces a learning architecture that tests and corrects for non-Markovity in partially observed environments using a deterministic factored finite state model, demonstrated in a near-deterministic Ms. Pac-Man game.
Contribution
It presents a novel architecture combining combinatorial policy optimization with a learnable deterministic model to handle non-Markovian environments effectively.
Findings
Effective in near-deterministic Ms. Pac-Man game
Can test and correct the Markov property of behavioral states
Analyzes architecture through evolutionary, individual, and social learning
Abstract
Reinforcement learning has solid foundations, but becomes inefficient in partially observed (non-Markovian) environments. Thus, a learning agent -born with a representation and a policy- might wish to investigate to what extent the Markov property holds. We propose a learning architecture that utilizes combinatorial policy optimization to overcome non-Markovity and to develop efficient behaviors, which are easy to inherit, tests the Markov property of the behavioral states, and corrects against non-Markovity by running a deterministic factored Finite State Model, which can be learned. We illustrate the properties of architecture in the near deterministic Ms. Pac-Man game. We analyze the architecture from the point of view of evolutionary, individual, and social learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Language and cultural evolution
