SPARTAN: A Sparse Transformer World Model Attending to What Matters
Anson Lei, Bernhard Sch\"olkopf, Ingmar Posner

TL;DR
SPARTAN is a sparse, Transformer-based world model that learns context-dependent interaction structures between objects, improving interpretability, adaptability, and robustness in dynamic environments.
Contribution
The paper introduces SPARTAN, a novel sparse Transformer world model that effectively captures local causal interactions and adapts to environment changes.
Findings
Outperforms state-of-the-art in object-centric world modeling
Learns accurate local causal interaction graphs
Shows improved few-shot adaptation and robustness
Abstract
Capturing the interactions between entities in a structured way plays a central role in world models that flexibly adapt to changes in the environment. Recent works motivate the benefits of models that explicitly represent the structure of interactions and formulate the problem as discovering local causal structures. In this work, we demonstrate that reliably capturing these relationships in complex settings remains challenging. To remedy this shortcoming, we postulate that sparsity is a critical ingredient for the discovery of such local structures. To this end, we present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns context-dependent interaction structures between entities in a scene. By applying sparsity regularisation on the attention patterns between object-factored tokens, SPARTAN learns sparse, context-dependent interaction graphs that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Dense Connections · Layer Normalization · Adam · Attention Dropout · Multi-Head Attention · Residual Connection
