ALER: An Active Learning Hybrid System for Efficient Entity Resolution
Dimitrios Karapiperis, Leonidas Akritidis, Panayiotis Bozanis, Vassilios Verykios

TL;DR
ALER introduces a scalable semi-supervised active learning system for entity resolution that significantly reduces training time and resolution latency on large datasets by combining static embeddings, representative sampling, and hybrid query strategies.
Contribution
ALER presents a novel semi-supervised pipeline that overcomes scalability issues in active learning for entity resolution by using frozen bi-encoders and efficient sampling techniques.
Findings
Accelerates training by 1.3x on large datasets.
Reduces resolution latency by a factor of 3.8.
Outperforms existing methods in efficiency and scalability.
Abstract
Entity Resolution (ER) is a critical task for data integration, yet state-of-the-art supervised deep learning models remain impractical for many real-world applications due to their need for massive, expensive-to-obtain labeled datasets. While Active Learning (AL) offers a potential solution to this "label scarcity" problem, existing approaches introduce severe scalability bottlenecks. Specifically, they achieve high accuracy but incur prohibitive computational costs by re-training complex models from scratch or solving NP-hard selection problems in every iteration. In this paper, we propose ALER, a novel, semi-supervised pipeline designed to bridge the gap between semantic accuracy and computational scalability. ALER eliminates the training bottleneck by using a frozen bi-encoder architecture to generate static embeddings once and then iteratively training a lightweight classifier on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Machine Learning in Healthcare
