Is merging worth it? Securely evaluating the information gain for causal dataset acquisition
Jake Fawkes, Lucile Ter-Minassian, Desi Ivanova, Uri Shalit, Chris Holmes

TL;DR
This paper introduces a cryptographically secure, privacy-preserving method to evaluate the potential benefits of merging datasets for causal effect estimation without revealing sensitive data, improving decision-making in data integration.
Contribution
It presents the first secure, information-theoretic approach for assessing dataset merge value in causal inference, combining multi-party computation with differential privacy.
Findings
Effective in simulated benchmarks
Reliable in realistic scenarios
Preserves privacy while maintaining accuracy
Abstract
Merging datasets across institutions is a lengthy and costly procedure, especially when it involves private information. Data hosts may therefore want to prospectively gauge which datasets are most beneficial to merge with, without revealing sensitive information. For causal estimation this is particularly challenging as the value of a merge depends not only on reduction in epistemic uncertainty but also on improvement in overlap. To address this challenge, we introduce the first cryptographically secure information-theoretic approach for quantifying the value of a merge in the context of heterogeneous treatment effect estimation. We do this by evaluating the Expected Information Gain (EIG) using multi-party computation to ensure that no raw data is revealed. We further demonstrate that our approach can be combined with differential privacy (DP) to meet arbitrary privacy requirements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
