HCLSM: Hierarchical Causal Latent State Machines for Object-Centric World Modeling

Jaber Jaber; Osama Jaber

arXiv:2603.29090·cs.LG·April 1, 2026

HCLSM: Hierarchical Causal Latent State Machines for Object-Centric World Modeling

Jaber Jaber, Osama Jaber

PDF

1 Repo

TL;DR

HCLSM introduces a hierarchical, object-centric world model that captures causal structure and temporal dynamics, improving future state prediction in video-based environments.

Contribution

The paper presents HCLSM, a novel architecture combining object decomposition, hierarchical dynamics, and causal learning, with a two-stage training protocol and significant speedups.

Findings

01

Achieved 0.008 MSE next-state prediction loss on PushT benchmark.

02

Emerging spatial decomposition and learned event boundaries.

03

38x speedup with custom Triton kernel for SSM scan.

Abstract

World models that predict future states from video remain limited by flat latent representations that entangle objects, ignore causal structure, and collapse temporal dynamics into a single scale. We present HCLSM, a world model architecture that operates on three interconnected principles: object-centric decomposition via slot attention with spatial broadcast decoding, hierarchical temporal dynamics through a three-level engine combining selective state space models for continuous physics, sparse transformers for discrete events, and compressed transformers for abstract goals, and causal structure learning through graph neural network interaction patterns. HCLSM introduces a two-stage training protocol where spatial reconstruction forces slot specialization before dynamics prediction begins. We train a 68M-parameter model on the PushT robotic manipulation benchmark from the Open…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rightnow-ai/hclsm
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.