Robust Group Linkage
Pei Li, Xin Luna Dong, Songtao Guo, Andrea Maurino, Divesh Srivastava

TL;DR
This paper introduces a robust two-stage algorithm for group linkage that effectively handles value diversity, large data sizes, and errors, demonstrating high accuracy and efficiency on real-world datasets.
Contribution
The paper proposes a novel two-stage group linkage algorithm that improves robustness to errors and value diversity, and scales efficiently to large datasets.
Findings
High accuracy in linking groups in real-world data
Effective handling of erroneous and diverse attribute values
Scalable performance on large datasets
Abstract
We study the problem of group linkage: linking records that refer to entities in the same group. Applications for group linkage include finding businesses in the same chain, finding conference attendees from the same affiliation, finding players from the same team, etc. Group linkage faces challenges not present for traditional record linkage. First, although different members in the same group can share some similar global values of an attribute, they represent different entities so can also have distinct local values for the same or different attributes, requiring a high tolerance for value diversity. Second, groups can be huge (with tens of thousands of records), requiring high scalability even after using good blocking strategies. We present a two-stage algorithm: the first stage identifies cores containing records that are very likely to belong to the same group, while being…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Data-Driven Disease Surveillance
