AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models
Conor Heins, Toon Van de Maele, Alexander Tschantz, Hampus Linander, Dimitrije Markovic, Tommaso Salvatori, Corrado Pezzato, Ozan Catal, Ran Wei, Magnus Koudahl, Marco Perin, Karl Friston, Tim Verbelen, Christopher Buckley

TL;DR
AXIOM is a novel object-centric model that learns to play games efficiently in minutes by combining active inference principles with dynamic scene modeling, enabling rapid, data-efficient, and generalizable reinforcement learning.
Contribution
It introduces a flexible, object-based generative model that expands and refines itself online, bridging active inference and deep RL for fast, general game learning.
Findings
AXIOM masters various games within 10,000 steps
It uses fewer parameters than traditional deep RL methods
Achieves high data efficiency without gradient-based training
Abstract
Current deep reinforcement learning (DRL) approaches achieve state-of-the-art performance in various domains, but struggle with data efficiency compared to human learning, which leverages core priors about objects and their interactions. Active inference offers a principled framework for integrating sensory information with prior knowledge to learn a world model and quantify the uncertainty of its own beliefs and predictions. However, active inference models are usually crafted for a single task with bespoke knowledge, so they lack the domain flexibility typical of DRL approaches. To bridge this gap, we propose a novel architecture that integrates a minimal yet expressive set of core priors about object-centric dynamics and interactions to accelerate learning in low-data regimes. The resulting approach, which we call AXIOM, combines the usual data efficiency and interpretability of…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper defines a large model with many different mixture model components for learning a variational posterior on observing trajectories within an RL problems. It applies variational inference and a mixture component split merge structure to develop an inferential procedure that can be used in planning. Instead of learning to optimize reward from the outset, it uses an approximate Bayesian approach for concurrent world modelling and refinement of parameter distribution training. Arguably this
(Please sort out the references - there is a significant lack of care in the references, capitalisation is all over the places - Gauss is a proper noun etc. This does not reflect well on the work). The paper has an overabundance of gratuitous references to the work of Karl Friston. Bayesian agent architectures have been around for decades prior to Parr et Al. Beliefs are always updated incrementally as new evidence emerges, it doesn't need another Friston reference to establish that. Nor is mixt
1. A novel way to employ model-based planning (without any neural networks or gradient optimization) that can potentially, in the future be an avenue for fast adaptation. 2. I appreciate that the authors provided anonymized code -- I had a brief look at it.
1. The core claim of "robustness to environmental perturbations" is not necessarily applicable to AXIOM in particular. As the authors point out, Dreamer and AXIOM are both similarly robust to such perturbations, and BBF instead outperforms both when it comes to robustness. So, I'm not fully convinced of this claim of robustness. 2. There are too many components in the model -- which isn't inherently a bad thing -- however, I wonder if this will scale up to more realistic observations. For insta
**Strengths** **1. Good Writing:** The paper presents a clear, fully probabilistic framework that decomposes perception, dynamics, and interaction into modular mixture components. AXIOM’s architecture is transparent, where each latent variable has a defined physical or semantic meaning (slot, type, mode, interaction). **2. High sample-efficiency:** Within only 10k interaction steps, AXIOM achieves competent performance across multiple tasks, often surpassing baselines such as DreamerV3 and BBF
**Weaknesses:** **1. Generalization to complex tasks:** The Gameworld-10k suite is tailored to object-centric, sparse-interaction dynamics with low visual complexity. While useful for probing the proposed priors, it risks design–method coupling and may inflate relative gains versus deep baselines optimized for high-dimensional, long-horizon settings. Claims of generality are not justified without external baselines. The paper acknowledges not scaling to “complicated control tasks typical of the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Machine Learning and Data Classification
MethodsSparse Evolutionary Training
