Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Heejeong Nam; Quentin Le Lidec; Lucas Maes; Yann LeCun; Randall Balestriero

arXiv:2602.11389·cs.AI·February 13, 2026

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Heejeong Nam, Quentin Le Lidec, Lucas Maes, Yann LeCun, Randall Balestriero

PDF

Open Access 1 Models

TL;DR

C-JEPA introduces an object-centric world model with object-level masking that enhances relational understanding, improves counterfactual reasoning, and enables more efficient planning in control tasks.

Contribution

It extends masked joint embedding prediction to object-centric representations, inducing causal latent interventions and improving interaction reasoning.

Findings

01

20% improvement in counterfactual reasoning for visual question answering

02

Achieves comparable control performance using only 1% of latent features

03

Formal analysis shows object-level masking induces causal inductive bias

Abstract

World models require robust relational understanding to support prediction, reasoning, and control. While object-centric representations provide a useful abstraction, they are not sufficient to capture interaction-dependent dynamics. We therefore propose C-JEPA, a simple and flexible object-centric world model that extends masked joint embedding prediction from image patches to object-centric representations. By applying object-level masking that requires an object's state to be inferred from other objects, C-JEPA induces latent interventions with counterfactual-like effects and prevents shortcut solutions, making interaction reasoning essential. Empirically, C-JEPA leads to consistent gains in visual question answering, with an absolute improvement of about 20\% in counterfactual reasoning compared to the same architecture without object-level masking. On agent control tasks, C-JEPA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
HazelNam/CJEPA
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Reinforcement Learning in Robotics