WDC Products: A Multi-Dimensional Entity Matching Benchmark
Ralph Peeters, Reng Chiz Der, Christian Bizer

TL;DR
WDC Products is a comprehensive benchmark for entity matching that evaluates systems across multiple real-world dimensions, including unseen entities and corner-cases, using both pairwise and multi-class formulations.
Contribution
It introduces a novel multi-dimensional benchmark for entity matching that considers unseen entities, corner-cases, and varying training set sizes, with both pairwise and multi-class tasks.
Findings
All systems struggle with unseen entities.
Contrastive learning is more data-efficient than cross-encoders.
Benchmark enables systematic evaluation across multiple challenging dimensions.
Abstract
The difficulty of an entity matching task depends on a combination of multiple factors such as the amount of corner-case pairs, the fraction of entities in the test set that have not been seen during training, and the size of the development set. Current entity matching benchmarks usually represent single points in the space along such dimensions or they provide for the evaluation of matching methods along a single dimension, for instance the amount of training data. This paper presents WDC Products, an entity matching benchmark which provides for the systematic evaluation of matching systems along combinations of three dimensions while relying on real-world data. The three dimensions are (i) amount of corner-cases (ii) generalization to unseen entities, and (iii) development set size (training set plus validation set). Generalization to unseen entities is a dimension not covered by any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsContrastive Learning · Test
