Borrowing Information from an Unidentifiable Model: Guaranteed Efficiency Gain with a Dichotomized Outcome in the External Data
Lu Wang, Yanyuan Ma, Jiwei Zhao

TL;DR
This paper develops methods to integrate continuous and dichotomized outcome data from different sources, improving statistical efficiency and robustness without assuming identical measurement scales or correct error distribution.
Contribution
It introduces two novel estimators for combining primary continuous outcome data with external dichotomized data, ensuring consistency and efficiency gains under minimal assumptions.
Findings
The first estimator remains consistent even if the error distribution is misspecified.
The second estimator guarantees an efficiency gain over using only primary data.
Simulation studies show robustness and improved efficiency across various scenarios.
Abstract
In the era of big data, the increasing availability of diverse data sources has driven interest in analytical approaches that integrate information across sources to enhance statistical accuracy, efficiency, and scientific insights. Many existing methods assume exchangeability among data sources and often implicitly require that sources measure identical covariates or outcomes, or that the error distribution is correctly specified-assumptions that may not hold in complex real-world scenarios. This paper explores the integration of data from sources with distinct outcome scales, focusing on leveraging external data to improve statistical efficiency. Specifically, we consider a scenario where the primary dataset includes a continuous outcome, and external data provides a dichotomized version of the same outcome. We propose two novel estimators: the first estimator remains asymptotically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
