Overlap-weighted orthogonal meta-learner for treatment effect estimation over time
Konstantin Hess, Dennis Frauen, Mihaela van der Schaar, Stefan Feuerriegel

TL;DR
This paper introduces an overlap-weighted orthogonal meta-learner designed for estimating heterogeneous treatment effects over time, effectively addressing overlap issues and improving estimation stability in low-overlap scenarios.
Contribution
It proposes a novel overlap-weighted orthogonal meta-learner with Neyman-orthogonality for robust, model-agnostic treatment effect estimation in time-varying settings.
Findings
Improved stability in low-overlap regions.
Robustness against nuisance function misspecification.
Effective with transformer and LSTM models.
Abstract
Estimating heterogeneous treatment effects (HTEs) in time-varying settings is particularly challenging, as the probability of observing certain treatment sequences decreases exponentially with longer prediction horizons. Thus, the observed data contain little support for many plausible treatment sequences, which creates severe overlap problems. Existing meta-learners for the time-varying setting typically assume adequate treatment overlap, and thus suffer from exploding estimation variance when the overlap is low. To address this problem, we introduce a novel overlap-weighted orthogonal (WO) meta-learner for estimating HTEs that targets regions in the observed data with high probability of receiving the interventional treatment sequences. This offers a fully data-driven approach through which our WO-learner can counteract instabilities as in existing meta-learners and thus obtain more…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper proposes a model-agnostic framework for adjusting time-varying confounders. - The paper provides a theoretical foundation for orthogonality. - An implementation code is available for review.
- The claim that weighting by overlap improves estimator stability is weakly justified. By emphasizing high-overlap regions, the learner avoids extreme inverse-propensity weights but can ignore potentially outcome-informative regions. Is there a trade-off between propensity and outcome predictions? - The motivation behind the study problem (treatment effect estimation over time) needs to be further clarified, particularly in relation to its real-world applicability. It is not evident whether the
- Compared with other meta-learners in this field, the study proposes a novel method to address the challenges of other learners in the low overlap scenario, which is common in the time varying treatment assignment strategy. - The overlap weighted orthogonal - The study has conducted both theorectical and empirical analysis to demonstrate the advantages of their proposed method. - The results demonstrate dramatical improvement of the performance in Table 2. And the performance is uniformly the
- Can you please revise the writing of the problem setup? Maybe provide some examples to illustrate what the treatment looks like. Based on my understanding of your problem, the two sequences should be a list of 1 or 0, correct? i.e. $a = [1, 0, 0, 1], b = [0, 1, 1, 1]$ - One very relevant study as mentioned in the paper is the IVW method. As stated in the IVW paper, the framework is a composition of IVW and DR learner [1]. The paper provides very brief description in line 116-125. I wonder here
- Clear motivation and the proposed estimator addresses the problem of insufficient overlap. - The estimator is flexible and is robust to nuisance errors as well. - Performed comprehensive experiments that addresses different settings. Results demonstrates the effectiveness of the proposed method at addressing the different issues discussed in the paper.
- While the paper showed that the proposed population risk is Neyman orthogonal, papers on mete-learners usually also provide error analysis and show that the error terms from the nuisance functions are higher order, which is not presented in the paper. - The main innovation is to address the problem of limited overlap, so it would be nice to have some theorems that showcase how this estimator have better behavior (e.g. variance) in those regimes.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Machine Learning in Healthcare · Statistical Methods and Inference
