Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies

Ying Jin; Jos\'e Zubizarreta

arXiv:2511.15896·stat.ME·November 21, 2025

Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies

Ying Jin, Jos\'e Zubizarreta

PDF

Open Access

TL;DR

This paper introduces cross-balancing, a novel method that leverages outcome data in observational study design to improve covariate adjustment, effect estimation, and inference, ensuring robustness and efficiency.

Contribution

It proposes a sample-splitting approach called cross-balancing that incorporates outcome information into covariate adjustment while maintaining valid inference, applicable to learned features and variable selection.

Findings

01

Cross-balancing reduces bias and improves efficiency in effect estimation.

02

The method is robust to slow convergence of learned features.

03

Simulations and real data show substantial improvements over traditional methods.

Abstract

Causal inference starts with a simple idea: compare groups that differ by treatment, not much else. Traditionally, similar groups are constructed using only observed covariates; however, it remains a long-standing challenge to incorporate available outcome data into the study design while preserving valid inference. In this paper, we study the general problem of covariate adjustment, effect estimation, and statistical inference when balancing features are constructed or selected with the aid of outcome information from the data. We propose cross-balancing, a method that uses sample splitting to separate the error in feature construction from the error in weight estimation. Our framework addresses two cases: one where the features are learned functions and one where they are selected from a potentially high-dimensional dictionary. In both cases, we establish mild and general conditions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference