Causal-JEPA: Learning World Models through Object-Level Latent Interventions
Heejeong Nam, Quentin Le Lidec, Lucas Maes, Yann LeCun, Randall Balestriero

TL;DR
C-JEPA introduces an object-centric world model with object-level masking that enhances relational understanding, improves counterfactual reasoning, and enables more efficient planning in control tasks.
Contribution
It extends masked joint embedding prediction to object-centric representations, inducing causal latent interventions and improving interaction reasoning.
Findings
20% improvement in counterfactual reasoning for visual question answering
Achieves comparable control performance using only 1% of latent features
Formal analysis shows object-level masking induces causal inductive bias
Abstract
World models require robust relational understanding to support prediction, reasoning, and control. While object-centric representations provide a useful abstraction, they are not sufficient to capture interaction-dependent dynamics. We therefore propose C-JEPA, a simple and flexible object-centric world model that extends masked joint embedding prediction from image patches to object-centric representations. By applying object-level masking that requires an object's state to be inferred from other objects, C-JEPA induces latent interventions with counterfactual-like effects and prevents shortcut solutions, making interaction reasoning essential. Empirically, C-JEPA leads to consistent gains in visual question answering, with an absolute improvement of about 20\% in counterfactual reasoning compared to the same architecture without object-level masking. On agent control tasks, C-JEPA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Reinforcement Learning in Robotics
