SPER: Accelerating Progressive Entity Resolution via Stochastic Bipartite Maximization

Dimitrios Karapiperis; George Papadakis; Vassilios Verykios

arXiv:2512.23491·cs.DB·January 5, 2026

SPER: Accelerating Progressive Entity Resolution via Stochastic Bipartite Maximization

Dimitrios Karapiperis, George Papadakis, Vassilios Verykios

PDF

Open Access

TL;DR

SPER introduces a stochastic sampling approach to accelerate progressive entity resolution, enabling linear-time prioritization that significantly outperforms existing methods in speed while maintaining accuracy.

Contribution

It proposes a novel stochastic bipartite maximization framework that replaces sorting, achieving scalable, fast progressive ER suitable for high-velocity data streams.

Findings

01

SPER achieves 3x to >6x speedup over baselines.

02

Maintains comparable recall and precision to state-of-the-art methods.

03

Scales effectively to high-velocity data streams.

Abstract

Entity Resolution (ER) is a critical data cleaning task for identifying records that refer to the same real-world entity. In the era of Big Data, traditional batch ER is often infeasible due to volume and velocity constraints, necessitating Progressive ER methods that maximize recall within a limited computational budget. However, existing progressive approaches fail to scale to high-velocity streams because they rely on deterministic sorting to prioritize candidate pairs, a process that incurs prohibitive super-linear complexity and heavy initialization costs. To address this scalability wall, we introduce SPER (Stochastic Progressive ER), a novel framework that redefines prioritization as a sampling problem rather than a ranking problem. By replacing global sorting with a continuous stochastic bipartite maximization strategy, SPER acts as a probabilistic high-pass filter that selects…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Time Series Analysis and Forecasting