Learning Robust Intervention Representations with Delta Embeddings

Panagiotis Alimisis; Christos Diou

arXiv:2508.04492·cs.CV·March 3, 2026

Learning Robust Intervention Representations with Delta Embeddings

Panagiotis Alimisis, Christos Diou

PDF

3 Reviews

TL;DR

This paper introduces Causal Delta Embeddings, a novel latent space representation for interventions that enhances out-of-distribution robustness in causal image models, without requiring extra supervision.

Contribution

It proposes a new method for learning intervention representations called Causal Delta Embeddings, improving OOD robustness in causal image tasks.

Findings

01

Causal Delta Embeddings outperform baselines in synthetic benchmarks.

02

The method achieves significant improvements in real-world OOD scenarios.

03

It demonstrates effectiveness without additional supervision.

Abstract

Causal representation learning has attracted significant research interest during the past few years, as a means for improving model generalization and robustness. Causal representations of interventional image pairs (also called ``actionable counterfactuals'' in the literature), have the property that only variables corresponding to scene elements affected by the intervention / action are changed between the start state and the end state. While most work in this area has focused on identifying and representing the variables of the scene under a causal model, fewer efforts have focused on representations of the interventions themselves. In this work, we show that an effective strategy for improving out of distribution (OOD) robustness is to focus on the representation of actionable counterfactuals in the latent space. Specifically, we propose that an intervention can be represented by a…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

* tackles a very important problem of learning robust and interpretable representations * leverages pretrained vision encoders in a good way and moves away from toy-dataset-only evaluations. * clear conceptual framing of intervention representation problem * strong quantitative OOD gains; well-executed ablations * visualization & semantic structure analysis support claims

Weaknesses

* only evaluates one vit backbone * requires heavy supervision that is only possible with synthetic data * empirical gains limited in real-world * lacks exploration of confounding effects or imperfect interventions

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper tackles a challenging problem in causal representation learning, namely the disentanglement of interventions, using a an original approach. The geometry of Delta embeddings could potentially convey very meaningful information, as hinted by the experiments and visualizations in the appendix. 2. The paper is well-written and easy to understand. The theoretical section complements well the description of the approach, justifying it accurately. 3. The experiments on out-of-distribution

Weaknesses

Experiments are conducted on a single benchmark (causal Triplet), which limits the generalizability of the findings (although the experiments in o.o.d settings mitigate this issue). Using larger backbone models on more datasets would further strenghten the contributions.

Reviewer 03Rating 4Confidence 3

Strengths

The idea of learning representations for actions is interesting, and it is a reasonable choice to learn these representations using interventional data. I believe Causal Delta Embeddings (CDEs) will have practical applications in robotics and other interactive domains.

Weaknesses

I include my major concerns under "weaknesses" and minor concerns (mainly related to writing) under "questions." My most important concerns are W1, W2, and W5 (d, e). Most weaknesses/questions can be answered without experiments. I will raise my score if my concerns are addressed. **W1. Interventional vs counterfactual**: A major confusion I have is whether CDE requires interventional or counterfactual data, the latter being a stricter requirement. In lines 194-197, the goal is stated to learn

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.