Identifying Representations for Intervention Extrapolation

Sorawit Saengkyongam; Elan Rosenfeld; Pradeep Ravikumar; Niklas; Pfister; Jonas Peters

arXiv:2310.04295·cs.LG·March 6, 2024·2 cites

Identifying Representations for Intervention Extrapolation

Sorawit Saengkyongam, Elan Rosenfeld, Pradeep Ravikumar, Niklas, Pfister, Jonas Peters

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a method called Rep4Ex that learns identifiable representations enabling accurate extrapolation of intervention effects on outcomes, even with non-linear transformations and unseen interventions.

Contribution

It provides theoretical guarantees for intervention extrapolation using identifiable representations and proposes a practical method enforcing linear invariance constraints.

Findings

01

Successfully predicts effects of unseen interventions

02

Identifiable representations are sufficient for extrapolation

03

Method works with any autoencoder type

Abstract

The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome Y, observed features X, which are generated as a non-linear transformation of latent features Z, and exogenous action variables A, which influence Z. The objective of intervention extrapolation is to predict…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

- Important problem. Can be thought of as OOD estimation of intervention effects through learning latent representations. - Extremely well written paper! - Crucially E[Y | do(A=a')] \neq E[Y | A=a'] for a' not in support of A. - The propositions are exactly at the places that the reader thinks about the question, and are easily understandeable. - The proofs of extrapolation are not straightforward. There have been several causal tools brought together to show the validity of extrapolation (invar

Weaknesses

- I did not see any major weakness. One model assumption that could be weakened in future work is the linearity assumption. - The role of the Wiener’s Tauberian theorem in the proof of hidden representation being identifiable, upto affine transformation, is not clear to me. Since this has been claimed in the abstract it would be helpful to delineate where it has been used.

Reviewer 02Rating 8· accept, good paperConfidence 4

Strengths

Using CF to achieve intervention extrapolation is a nice idea. The theoretical analysis is serious and detailed (but I did not check the proofs in Appendix). The paper is quite well written.

Weaknesses

*Technical novelties seem to be weak*. Theorem 4 seems to be an adaptation of the CF approach in (Newey et al., 1999), and Theorem 6 seems to be an adaptation of the IV approach in (D’Haultfoeuille, 2011). If there are some technical novelties, they should be discussed and compared to the original works; otherwise, I suggest being more explicit about this weakness. *Some assumptions are strong*; particularly, the linear model between Z and A, and the injective model and noiseless model between

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

- The paper studies intervention extrapolation under a well-chosen set of assumptions, which I find more appealing than the previously studied scenarios in the literature. - To the best of my knowledge, the proposed identification strategy is novel. - The manuscript is clear and well-written. - The proposed algorithm is straightforward and practical.

Weaknesses

- The assumptions are clear mathematically but might seem opaque to readers unfamiliar with the literature. The authors may want to give an example of what some of the key assumptions would imply in a simple setup. - Many of the structural assumptions are not testable, and it is unclear to me when one shall be comfortable using the proposed method. Also see questions.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)