FACTS: A Factored State-Space Framework For World Modelling
Li Nanbo, Firas Laakom, Yucheng Xu, Wenyi Wang, J\"urgen Schmidhuber

TL;DR
The paper introduces FACTS, a novel recurrent framework for spatial-temporal world modelling that efficiently captures complex dependencies, supports parallel computation, and outperforms existing models across various tasks.
Contribution
FACTS is a new graph-structured memory framework with permutation invariance and parallel processing, advancing state-space models for world modelling.
Findings
Outperforms or matches state-of-the-art models in diverse tasks
Supports parallel computation for high-dimensional sequences
Provides permutation-invariant and adaptable memory representations
Abstract
World modelling is essential for understanding and predicting the dynamics of complex systems by learning both spatial and temporal dependencies. However, current frameworks, such as Transformers and selective state-space models like Mambas, exhibit limitations in efficiently encoding spatial and temporal structures, particularly in scenarios requiring long-term high-dimensional sequence modelling. To address these issues, we propose a novel recurrent framework, the \textbf{FACT}ored \textbf{S}tate-space (\textbf{FACTS}) model, for spatial-temporal world modelling. The FACTS framework constructs a graph-structured memory with a routing mechanism that learns permutable memory representations, ensuring invariance to input permutations while adapting through selective state-space propagation. Furthermore, FACTS supports parallel computation of high-dimensional sequences. We empirically…
Peer Reviews
Decision·ICLR 2025 Poster
- The proposed architecture introduces a permutable memory structure, allowing flexible handling of unordered or dynamically changing inputs. The paper achieves improved performance over baselines by compressing history efficiently, and hence capturing long-term dependencies. - The paper is easy to read and comprehend. - The results shown on long term forecasting are interesting, and helps the reviewer to understand the implications of the proposed work better (especially forecasting with pre-d
- Object centric video modelling results are a bit weak. it will be interesting to report results also on OBJ3D (another benchmark used in Slotformer paper). It will also be helpful to report downstream results like Predictive VQA on CLEVRER, Physion (similar experiments as in Slotformer paper).
1. The paper addresses the critical challenge of input feature variance, an interesting issue in spatial-temporal learning, by introducing a novel method that utilizes a memory-input routing mechanism. This approach effectively manages the dynamic relationships between input features, ensuring robust modeling even when input orders change. 2. The proposed FACTS model is both simple and highly effective due to its memory-input routing mechanism, which dynamically assigns input features to latent
For the slot dynamics prediction experiment, the method proposed in the paper relies on a pre-trained encoder and is not end-to-end, which may limit its applicability.
- The paper is well-written and easy to follow. - Modular recurrent architectures are well-studied for world modelling in recent literature, this paper introduces modularity into SSMs while also maintaining their parallel processing capabilities thus the approach seems promising in terms of effeciency.
- One integral component of the model is the attention mechanism which assigns input nodes to latent factors. I believe that similar kinds of attention mechanism for the tasks similar to the ones studied in this paper have already been explored before in various past works [1, 2, 3]. I wonder if the authors could present a comparison of their method to these approaches or atleast highlight the differences. Specifically, [2] proposes to also incorporate modularity and factorization into SSMs, it
Code & Models
Videos
Taxonomy
TopicsModeling and Simulation Systems · Simulation Techniques and Applications
