Efficient Model Repository for Entity Resolution: Construction, Search, and Integration
Victor Christen, Peter Christen

TL;DR
MoRER is a novel approach that builds a model repository for entity resolution, enabling effective reuse across tasks with limited labeling, outperforming some existing methods.
Contribution
Introduces MoRER, a method for constructing a model repository for ER that leverages feature analysis and clustering to improve performance with less labeling effort.
Findings
MoRER achieves comparable or better results than label-limited methods.
It outperforms self-supervised approaches using large pre-trained models.
MoRER performs comparably or better than supervised transformer-based methods.
Abstract
Entity resolution (ER) is a fundamental task in data integration that enables insights from heterogeneous data sources. The primary challenge of ER lies in classifying record pairs as matches or nonmatches, which in multi-source ER (MS-ER) scenarios can become complicated due to data source heterogeneity and scalability issues. Existing methods for MS-ER generally require labeled record pairs, and such methods fail to effectively reuse models across multiple ER tasks. We propose MoRER (Model Repositories for Entity Resolution), a novel method for building a model repository consisting of classification models that solve ER problems. By leveraging feature distribution analysis, MoRER clusters similar ER tasks, thereby enabling the effective initialization of a model repository with a moderate labeling effort. Experimental results on three multi-source datasets demonstrate that MoRER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
