Multidimensional Assignment Problem for multipartite entity resolution
Alla Kammerdiner, Alexander Semenov, Eduardo Pasiliao

TL;DR
This paper formulates multipartite entity resolution as a multidimensional assignment problem, compares heuristic algorithms for solving it, and demonstrates the effectiveness of a hybrid approach with multi-start strategies on synthetic data.
Contribution
It introduces a mathematical formulation for multipartite entity resolution as a multidimensional assignment problem and evaluates heuristic algorithms, including hybrid methods, for large datasets.
Findings
Very large-scale neighborhood search outperforms Greedy heuristic.
Design-based multi-start improves efficiency on large datasets.
Hybrid heuristic enhances assignment accuracy.
Abstract
Multipartite entity resolution aims at integrating records from multiple datasets into one entity. We derive a mathematical formulation for a general class of record linkage problems in multipartite entity resolution across many datasets as a combinatorial optimization problem known as the multidimensional assignment problem. As a motivation for our approach, we illustrate the advantage of multipartite entity resolution over sequential bipartite matching. Because the optimization problem is NP-hard, we apply two heuristic procedures, a Greedy algorithm and very large scale neighborhood search, to solve the assignment problem and find the most likely matching of records from multiple datasets into a single entity. We evaluate and compare the performance of these algorithms and their modifications on synthetically generated data. We perform computational experiments to compare performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data
