AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models

Conor Heins; Toon Van de Maele; Alexander Tschantz; Hampus Linander; Dimitrije Markovic; Tommaso Salvatori; Corrado Pezzato; Ozan Catal; Ran Wei; Magnus Koudahl; Marco Perin; Karl Friston; Tim Verbelen; Christopher Buckley

arXiv:2505.24784·cs.AI·June 2, 2025

AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models

Conor Heins, Toon Van de Maele, Alexander Tschantz, Hampus Linander, Dimitrije Markovic, Tommaso Salvatori, Corrado Pezzato, Ozan Catal, Ran Wei, Magnus Koudahl, Marco Perin, Karl Friston, Tim Verbelen, Christopher Buckley

PDF

Open Access 3 Reviews

TL;DR

AXIOM is a novel object-centric model that learns to play games efficiently in minutes by combining active inference principles with dynamic scene modeling, enabling rapid, data-efficient, and generalizable reinforcement learning.

Contribution

It introduces a flexible, object-based generative model that expands and refines itself online, bridging active inference and deep RL for fast, general game learning.

Findings

01

AXIOM masters various games within 10,000 steps

02

It uses fewer parameters than traditional deep RL methods

03

Achieves high data efficiency without gradient-based training

Abstract

Current deep reinforcement learning (DRL) approaches achieve state-of-the-art performance in various domains, but struggle with data efficiency compared to human learning, which leverages core priors about objects and their interactions. Active inference offers a principled framework for integrating sensory information with prior knowledge to learn a world model and quantify the uncertainty of its own beliefs and predictions. However, active inference models are usually crafted for a single task with bespoke knowledge, so they lack the domain flexibility typical of DRL approaches. To bridge this gap, we propose a novel architecture that integrates a minimal yet expressive set of core priors about object-centric dynamics and interactions to accelerate learning in low-data regimes. The resulting approach, which we call AXIOM, combines the usual data efficiency and interpretability of…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

The paper defines a large model with many different mixture model components for learning a variational posterior on observing trajectories within an RL problems. It applies variational inference and a mixture component split merge structure to develop an inferential procedure that can be used in planning. Instead of learning to optimize reward from the outset, it uses an approximate Bayesian approach for concurrent world modelling and refinement of parameter distribution training. Arguably this

Weaknesses

(Please sort out the references - there is a significant lack of care in the references, capitalisation is all over the places - Gauss is a proper noun etc. This does not reflect well on the work). The paper has an overabundance of gratuitous references to the work of Karl Friston. Bayesian agent architectures have been around for decades prior to Parr et Al. Beliefs are always updated incrementally as new evidence emerges, it doesn't need another Friston reference to establish that. Nor is mixt

Reviewer 02Rating 4Confidence 4

Strengths

1. A novel way to employ model-based planning (without any neural networks or gradient optimization) that can potentially, in the future be an avenue for fast adaptation. 2. I appreciate that the authors provided anonymized code -- I had a brief look at it.

Weaknesses

1. The core claim of "robustness to environmental perturbations" is not necessarily applicable to AXIOM in particular. As the authors point out, Dreamer and AXIOM are both similarly robust to such perturbations, and BBF instead outperforms both when it comes to robustness. So, I'm not fully convinced of this claim of robustness. 2. There are too many components in the model -- which isn't inherently a bad thing -- however, I wonder if this will scale up to more realistic observations. For insta

Reviewer 03Rating 6Confidence 2

Strengths

**Strengths** **1. Good Writing:** The paper presents a clear, fully probabilistic framework that decomposes perception, dynamics, and interaction into modular mixture components. AXIOM’s architecture is transparent, where each latent variable has a defined physical or semantic meaning (slot, type, mode, interaction). **2. High sample-efficiency:** Within only 10k interaction steps, AXIOM achieves competent performance across multiple tasks, often surpassing baselines such as DreamerV3 and BBF

Weaknesses

**Weaknesses:** **1. Generalization to complex tasks:** The Gameworld-10k suite is tailored to object-centric, sparse-interaction dynamics with low visual complexity. While useful for probing the proposed priors, it risks design–method coupling and may inflate relative gains versus deep baselines optimized for high-dimensional, long-horizon settings. Claims of generality are not justified without external baselines. The paper acknowledges not scaling to “complicated control tasks typical of the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Machine Learning and Data Classification

MethodsSparse Evolutionary Training