Is merging worth it? Securely evaluating the information gain for causal dataset acquisition

Jake Fawkes; Lucile Ter-Minassian; Desi Ivanova; Uri Shalit; Chris Holmes

arXiv:2409.07215·stat.ML·July 3, 2025

Is merging worth it? Securely evaluating the information gain for causal dataset acquisition

Jake Fawkes, Lucile Ter-Minassian, Desi Ivanova, Uri Shalit, Chris Holmes

PDF

Open Access

TL;DR

This paper introduces a cryptographically secure, privacy-preserving method to evaluate the potential benefits of merging datasets for causal effect estimation without revealing sensitive data, improving decision-making in data integration.

Contribution

It presents the first secure, information-theoretic approach for assessing dataset merge value in causal inference, combining multi-party computation with differential privacy.

Findings

01

Effective in simulated benchmarks

02

Reliable in realistic scenarios

03

Preserves privacy while maintaining accuracy

Abstract

Merging datasets across institutions is a lengthy and costly procedure, especially when it involves private information. Data hosts may therefore want to prospectively gauge which datasets are most beneficial to merge with, without revealing sensitive information. For causal estimation this is particularly challenging as the value of a merge depends not only on reduction in epistemic uncertainty but also on improvement in overlap. To address this challenge, we introduce the first cryptographically secure information-theoretic approach for quantifying the value of a merge in the context of heterogeneous treatment effect estimation. We do this by evaluating the Expected Information Gain (EIG) using multi-party computation to ensure that no raw data is revealed. We further demonstrate that our approach can be combined with differential privacy (DP) to meet arbitrary privacy requirements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference