Machop: an End-to-End Generalized Entity Matching Framework
Jin Wang, Yuliang Li, Wataru Hirota, Eser Kandogan

TL;DR
Machop introduces a flexible, end-to-end framework for generalized entity matching that leverages Transformer models and external knowledge, significantly improving matching accuracy across diverse real-world applications.
Contribution
It presents a novel GEM problem formulation and a comprehensive pipeline that enables domain-specific, semantics-rich entity matching using language models and knowledge injection.
Findings
Achieves 17.1% F1 score improvement over state-of-the-art methods.
Supports flexible, domain-adaptable matching tasks.
Produces human-understandable matching results.
Abstract
Real-world applications frequently seek to solve a general form of the Entity Matching (EM) problem to find associated entities. Such scenarios include matching jobs to candidates in job targeting, matching students with courses in online education, matching products with user reviews on e-commercial websites, and beyond. These tasks impose new requirements such as matching data entries with diverse formats or having a flexible and semantics-rich matching definition, which are beyond the current EM task formulation or approaches. In this paper, we introduce the problem of Generalized Entity Matching (GEM) that satisfies these practical requirements and presents an end-to-end pipeline Machop as the solution. Machop allows end-users to define new matching tasks from scratch and apply them to new domains in a step-by-step manner. Machop casts the GEM problem as sequence pair classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Natural Language Processing Techniques
