# Beyond reweighting: On the predictive role of covariate shift in effect generalization

**Authors:** Ying Jin, Naoki Egami, Dominik Rothenhäusler

PMC · DOI: 10.1073/pnas.2427181122 · Proceedings of the National Academy of Sciences of the United States of America · 2025-11-03

## TL;DR

This paper shows that changes in observable factors can predict changes in unobserved factors when generalizing scientific findings across populations.

## Contribution

The paper introduces a statistical theory showing that observable covariate shifts can predict unobserved conditional shifts, improving generalizability.

## Key findings

- Covariate shift can predict the strength of unobserved conditional shift across populations.
- Conditional shift is nonnegligible but bounded by observable covariate shift when measured with standardized methods.
- Using covariate shift for prediction improves uncertainty quantification in generalization tasks.

## Abstract

Generalizing scientific findings across diverse populations is fundamental to many scientific fields and policymakers who use such analyses to guide decision-making. Traditional methods often account for the differences in populations by reweighting observed covariates, assuming only observed variables shift across the populations. Analyzing large-scale replication studies in the social sciences, we empirically demonstrate that i) shifts in unobserved variables are common, but ii) such shifts can be predicted from shifts in observed covariates. We propose a statistical theory of distributional shifts to explain this predictive, rather than merely explanatory, role of covariates in effect generalization. Our results serve as the empirical and conceptual basis for developing new statistical methods for generalizability and external validity.

Many existing approaches to generalizing statistical inference amid distribution shift operate under the covariate shift assumption, which posits that the conditional distribution of unobserved variables given observable ones is invariant across populations. However, recent empirical investigations have demonstrated that adjusting for shifts in observed variables (covariate shift) is often insufficient for generalization. In other words, covariate shift does not typically “explain away” the distribution shift between populations. As such, addressing the unknown yet nonnegligible shift in the unobserved variables given observed ones (conditional shift) is crucial for generalizable inference. In this paper, we present empirical evidence from two large-scale multisite replication studies indicating that covariate shift can help predict the strength of unknown conditional shift. Analyzing 680 studies across 65 sites, we find that even though the conditional shift is nonnegligible, its strength can often be bounded by that of the observable covariate shift. This pattern only emerges when the two sources of shifts are quantified by our proposed standardized, pivotal measures. We then interpret this phenomenon by connecting it to similar patterns that can be theoretically derived from a random distribution shift model. Finally, we demonstrate that exploiting the predictive role of covariate shift leads to reliable and efficient uncertainty quantification for target estimates in generalization tasks with partially observed data. Overall, our empirical and theoretical analyses highlight an alternative perspective on the problem of distributional shift, generalizability, and external validity.

## Full-text entities

- **Chemicals:** PNAS (MESH:D020135)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12625858/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12625858/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/PMC12625858/full.md

---
Source: https://tomesphere.com/paper/PMC12625858