Record fusion: A learning approach
Alireza Heidari, George Michalopoulos, Shrinu Kushagra, Ihab F. Ilyas,, Theodoros Rekatsinas

TL;DR
This paper introduces a machine learning approach for record fusion that leverages a novel stagewise additive model to accurately merge database records representing the same entity, achieving high precision.
Contribution
The paper presents a new stagewise additive learning model for record fusion, combining multiple signals and deep transformations to improve accuracy over existing methods.
Findings
Achieves ~98% precision with source info
Achieves ~94% precision without source info
Outperforms existing data fusion methods by 20-45%
Abstract
Record fusion is the task of aggregating multiple records that correspond to the same real-world entity in a database. We can view record fusion as a machine learning problem where the goal is to predict the "correct" value for each attribute for each entity. Given a database, we use a combination of attribute-level, recordlevel, and database-level signals to construct a feature vector for each cell (or (row, col)) of that database. We use this feature vector alongwith the ground-truth information to learn a classifier for each of the attributes of the database. Our learning algorithm uses a novel stagewise additive model. At each stage, we construct a new feature vector by combining a part of the original feature vector with features computed by the predictions from the previous stage. We then learn a softmax classifier over the new feature space. This greedy stagewise approach can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Anomaly Detection Techniques and Applications · Data-Driven Disease Surveillance
MethodsSoftmax
