Is External Information Useful for Data Fusion? An Evaluation before Acquisition
Guorong Dai, Lingxuan Shao, Jinbo Chen

TL;DR
This paper introduces a method to evaluate the potential benefit of external information for data fusion using internal data alone, enabling cost-effective decisions on acquiring external data before actual collection.
Contribution
It proposes a universal utility measure based on efficiency bounds and a methodology to estimate it using only internal data, applicable across various external information types.
Findings
The utility measure accurately predicts efficiency gains from external data.
The proposed estimators have proven asymptotic properties.
Simulation and real data demonstrate practical effectiveness.
Abstract
We consider a general statistical estimation problem involving a finite-dimensional target parameter vector. Beyond an internal data set drawn from the population distribution, external information, such as additional individual data or summary statistics, can potentially improve the estimation when incorporated via appropriate data fusion techniques. However, since acquiring external information often incurs costs, it is desirable to assess its utility beforehand using only the internal data. To address this need, we introduce a utility measure based on estimation efficiency, defined as the ratio of semiparametric efficiency bounds for estimating the target parameters with versus without incorporating the external information. It quantifies the maximum potential efficiency improvement offered by the external information, independent of specific estimation methods. To enable inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
