TL;DR
This paper presents a novel ILP-based approach for entity resolution using correlation-clustering, enhanced by flexible dual optimal inequalities to accelerate column generation, achieving state-of-the-art accuracy on benchmark datasets.
Contribution
Introduces a new ILP formulation for entity resolution as correlation-clustering with flexible dual optimal inequalities to improve optimization efficiency.
Findings
Achieves state-of-the-art accuracy on benchmark datasets.
Demonstrates significant acceleration in column generation process.
Validates effectiveness of flexible dual optimal inequalities.
Abstract
In this paper, we introduce a new optimization approach to Entity Resolution. Traditional approaches tackle entity resolution with hierarchical clustering, which does not benefit from a formal optimization formulation. In contrast, we model entity resolution as correlation-clustering, which we treat as a weighted set-packing problem and write as an integer linear program (ILP). In this case sources in the input data correspond to elements and entities in output data correspond to sets/clusters. We tackle optimization of weighted set packing by relaxing integrality in our ILP formulation. The set of potential sets/clusters can not be explicitly enumerated, thus motivating optimization via column generation. In addition to the novel formulation, we also introduce new dual optimal inequalities (DOI), that we call flexible dual optimal inequalities, which tightly lower-bound dual variables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
