TL;DR
This paper introduces Causal Delta Embeddings, a novel latent space representation for interventions that enhances out-of-distribution robustness in causal image models, without requiring extra supervision.
Contribution
It proposes a new method for learning intervention representations called Causal Delta Embeddings, improving OOD robustness in causal image tasks.
Findings
Causal Delta Embeddings outperform baselines in synthetic benchmarks.
The method achieves significant improvements in real-world OOD scenarios.
It demonstrates effectiveness without additional supervision.
Abstract
Causal representation learning has attracted significant research interest during the past few years, as a means for improving model generalization and robustness. Causal representations of interventional image pairs (also called ``actionable counterfactuals'' in the literature), have the property that only variables corresponding to scene elements affected by the intervention / action are changed between the start state and the end state. While most work in this area has focused on identifying and representing the variables of the scene under a causal model, fewer efforts have focused on representations of the interventions themselves. In this work, we show that an effective strategy for improving out of distribution (OOD) robustness is to focus on the representation of actionable counterfactuals in the latent space. Specifically, we propose that an intervention can be represented by a…
Peer Reviews
Decision·ICLR 2026 Poster
* tackles a very important problem of learning robust and interpretable representations * leverages pretrained vision encoders in a good way and moves away from toy-dataset-only evaluations. * clear conceptual framing of intervention representation problem * strong quantitative OOD gains; well-executed ablations * visualization & semantic structure analysis support claims
* only evaluates one vit backbone * requires heavy supervision that is only possible with synthetic data * empirical gains limited in real-world * lacks exploration of confounding effects or imperfect interventions
1. The paper tackles a challenging problem in causal representation learning, namely the disentanglement of interventions, using a an original approach. The geometry of Delta embeddings could potentially convey very meaningful information, as hinted by the experiments and visualizations in the appendix. 2. The paper is well-written and easy to understand. The theoretical section complements well the description of the approach, justifying it accurately. 3. The experiments on out-of-distribution
Experiments are conducted on a single benchmark (causal Triplet), which limits the generalizability of the findings (although the experiments in o.o.d settings mitigate this issue). Using larger backbone models on more datasets would further strenghten the contributions.
The idea of learning representations for actions is interesting, and it is a reasonable choice to learn these representations using interventional data. I believe Causal Delta Embeddings (CDEs) will have practical applications in robotics and other interactive domains.
I include my major concerns under "weaknesses" and minor concerns (mainly related to writing) under "questions." My most important concerns are W1, W2, and W5 (d, e). Most weaknesses/questions can be answered without experiments. I will raise my score if my concerns are addressed. **W1. Interventional vs counterfactual**: A major confusion I have is whether CDE requires interventional or counterfactual data, the latter being a stricter requirement. In lines 194-197, the goal is stated to learn
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
