Categorical Data Fusion Using Auxiliary Information
Bailey K. Fosdick, Maria DeYoreo, Jerome P. Reiter

TL;DR
This paper introduces a novel data fusion method that incorporates auxiliary dependence information, called glue, to improve the integration of disjoint databases, demonstrated through a case study with marketing surveys.
Contribution
It presents a new technique for data fusion that leverages auxiliary information on variable dependence, relaxing traditional independence assumptions.
Findings
Fused data revealed associations between author preferences and learning about new books.
The method successfully integrated online survey data with other sources.
Case study demonstrated practical utility of auxiliary information in data fusion.
Abstract
In data fusion analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions. When inappropriate, these assumptions can result in unreliable inferences. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information on the dependence structure of variables not observed jointly; we refer to this auxiliary information as glue. With this technique, we fuse two marketing surveys from the book publisher HarperCollins using glue from the online, rapid-response polling company CivicScience. The fused data enable estimation of associations between people's preferences for authors and for learning about new books. The analysis also serves as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
