PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Pengfei Wang; Xiaocan Zeng; Lu Chen; Fan Ye; Yuren Mao; Junhao Zhu,; Yunjun Gao

arXiv:2207.04802·cs.DB·July 19, 2022·5 cites

PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Pengfei Wang, Xiaocan Zeng, Lu Chen, Fan Ye, Yuren Mao, Junhao Zhu,, Yunjun Gao

PDF

Open Access 2 Repos

TL;DR

This paper introduces PromptEM, a novel prompt-tuning approach for low-resource generalized entity matching, effectively reducing labeling efforts and improving performance across diverse data formats.

Contribution

PromptEM is the first method to apply prompt-tuning to low-resource GEM, addressing prompt design, pseudo-label quality, and efficient self-training.

Findings

01

Outperforms existing methods on eight benchmarks.

02

Effective in low-resource settings with limited labeled data.

03

Demonstrates superior efficiency and accuracy.

Abstract

Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entity records of different formats (e.g., relational, semi-structured, or textual types) involve in practical scenarios. It is not practical to unify their schemas due to the different formats. To support EM on format-different entity records, Generalized Entity Matching (GEM) has been proposed and gained much attention recently. To do GEM, existing methods typically perform in a supervised learning way, which relies on a large amount of high-quality labeled examples. However, the labeling process is extremely labor-intensive, and frustrates the use of GEM. Low-resource GEM, i.e., GEM that only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Data Mining Algorithms and Applications