CollaborER: A Self-supervised Entity Resolution Framework Using Multi-features Collaboration
Congcong Ge, Pengfei Wang, Lu Chen, Xiaoze Liu, Baihua Zheng, Yunjun, Gao

TL;DR
CollaborER is a self-supervised framework for entity resolution that leverages multi-feature collaboration to achieve high accuracy without human annotations, outperforming existing unsupervised methods and rivaling supervised approaches.
Contribution
It introduces a novel two-phase self-supervised framework with automatic label generation and collaborative training for stable, annotation-free entity resolution.
Findings
Outperforms all existing unsupervised ER methods.
Comparable or superior to state-of-the-art supervised ER methods.
Effective in discovering graph and sentence features for matching.
Abstract
Entity Resolution (ER) aims to identify whether two tuples refer to the same real-world entity and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as comparing the attribute values of two matched tuples from two different datasets provides one effective way to detect anomalies. Existing ER approaches, due to insufficient feature discovery or error-prone inherent characteristics, are not able to achieve stable performance. In this paper, we present CollaborER, a self-supervised entity resolution framework via multi-features collaboration. It is capable of (i) obtaining reliable ER results with zero human annotations and (ii) discovering adequate tuples' features in a fault-tolerant manner. CollaborER consists of two phases, i.e., automatic label generation (ALG) and collaborative ER training (CERT). In the first phase, ALG is proposed to generate a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Web Data Mining and Analysis
