Dissociating model architectures from inference computations
Noor Sajid, Johan Medrano

TL;DR
This paper explores how different model architectures for sequence modeling can be separated from the inference computations they perform, showing that similar computations can be achieved across architectures with hierarchical inference strategies.
Contribution
It demonstrates that hierarchical temporal factorization during inference allows autoregressive models to mimic deep temporal computations, decoupling architecture from inference process.
Findings
Hierarchical temporal factorization maintains predictive capacity.
Autoregressive models can mimic deep temporal computations.
Inference processes are not strictly tied to model architecture.
Abstract
Parr et al., 2025 examines how auto-regressive and deep temporal models differ in their treatment of non-Markovian sequence modelling. Building on this, we highlight the need for dissociating model architectures, i.e., how the predictive distribution factorises, from the computations invoked at inference. We demonstrate that deep temporal computations are mimicked by autoregressive models by structuring context access during iterative inference. Using a transformer trained on next-token prediction, we show that inducing hierarchical temporal factorisation during iterative inference maintains predictive capacity while instantiating fewer computations. This emphasises that processes for constructing and refining predictions are not necessarily bound to their underlying model architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
