Estimating the Joint Distribution of Two Binary Variables with Marginal Statistics
Longwen Shang, Min Tsao, Xuekui Zhang

TL;DR
This paper introduces a maximum likelihood-based method to estimate the joint distribution of two binary variables using only marginal summary data, addressing privacy and data access issues in clinical trial simulation.
Contribution
It presents a novel approach that accurately estimates joint distributions from marginal data, enabling realistic clinical trial simulations without requiring individual-level data.
Findings
Method achieves high accuracy in diverse simulation scenarios
Robustness demonstrated across varying sample sizes and data conditions
Application to real data shows practical utility and improved simulation realism
Abstract
Clinical trial simulation (CTS) is critical in new drug development, providing insight into safety and efficacy while guiding trial design. Achieving realistic outcomes in CTS requires an accurately estimated joint distribution of the underlying variables. However, privacy concerns and data availability issues often restrict researchers to marginal summary-level data of each variable, making it challenging to estimate the joint distribution due to the lack of access to individual-level data or relational summaries between variables. We propose a novel approach based on the method of maximum likelihood that estimates the joint distribution of two binary variables using only marginal summary data. By leveraging numerical optimization and accommodating varying sample sizes across studies, our method preserves privacy while bypassing the need for granular or relational data. Through an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Privacy-Preserving Technologies in Data
