Learning Treatment Representations for Downstream Instrumental Variable Regression
Shiangyi Lin, Hui Lan, Vasilis Syrgkanis

TL;DR
This paper introduces a novel method for learning treatment representations in instrumental variable regression that explicitly incorporates instruments during representation learning, improving identification and reducing bias in high-dimensional settings.
Contribution
The paper proposes a new approach to construct treatment representations by integrating instrumental variables into the learning process, addressing limitations of traditional dimension reduction methods.
Findings
Instrument-informed representations improve outcome prediction.
The method reduces omitted variable bias in high-dimensional IV regression.
Empirical results outperform conventional two-stage approaches.
Abstract
Traditional instrumental variable (IV) estimators face a fundamental constraint: they can only accommodate as many endogenous treatment variables as available instruments. This limitation becomes particularly challenging in settings where the treatment is presented in a high-dimensional and unstructured manner (e.g. descriptions of patient treatment pathways in a hospital). In such settings, researchers typically resort to applying unsupervised dimension reduction techniques to learn a low-dimensional treatment representation prior to implementing IV regression analysis. We show that such methods can suffer from substantial omitted variable bias due to implicit regularization in the representation learning step. We propose a novel approach to construct treatment representations by explicitly incorporating instrumental variables during the representation learning process. Our approach…
Peer Reviews
Decision·Submitted to ICLR 2026
- The IV setting with potentially high dimensional confounded treatments is an important and underexplored research direction in causal inference research. - The motivation showing that omitted variable bias can also come from dimensionality reduction/representation learning of treatments (and not only of cofounders like most previous work showed) is an interesting and important finding - Their method, ensuring that the IV information is maintained in the treatment representation, is a nice and
The clarity of the paper could be improved and implications of different parts of the method could be mentioned. - Part of the main motivation is that IV estimators “can only accommodate as many endogenous treatment variables as available instruments”. I think while in general it makes sense that effect estimation with high dimensional treatments is challenging, this statement should be explained more and shown in more detail. - I think the related work is a bit limited to some specific works an
The authors pinpoint how standard unsupervised dimensionality reduction of high dimensional treatments can violate the exclusion restriction by discarding instrument driven variation, a problem they term omitted treatment bias. The proposed solution of instrument guided representation learning represents a creative fusion of causal inference and representation learning, directly addressing this limitation in prior two stage approaches. The authors develop a complete framework with specialized m
The theoretical foundation of this paper relies on a set of strong structural assumptions that may be difficult to satisfy in real world applications. A key example is the core assumption of joint independence. This assumption requires the instrument, the confounder representation, and the orthogonal components to be fully independent. This is particularly challenging to guarantee with high dimensional and complex data. Furthermore, prerequisites such as the invertibility of the encoding and dec
- The problem setting is interesting and novel to the best of my knowledge
- Line 67: The paper states that causal representation learning is uncommon in causal inference, but common in causal discovery. This is not true. The paper thus lacks a sufficient discussion of related work on representation learning in causal inference tasks. - The method requires very strong untestable assumptions, limiting its applicability in practice. - The paper neither contains a discussion on the method, its limitations, or the results, nor a conclusion, leaving it an unfinished work.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Machine Learning and Algorithms
