Demonstration of Panda: A Weakly Supervised Entity Matching System

Renzhi Wu; Prem Sakala; Peng Li; Xu Chu; Yeye He

arXiv:2106.10821·cs.DB·September 27, 2021

Demonstration of Panda: A Weakly Supervised Entity Matching System

Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, Yeye He

PDF

Open Access

TL;DR

Panda is a weakly supervised entity matching system that leverages user-defined labeling functions and an integrated IDE to efficiently develop high-quality EM solutions without extensive labeled data.

Contribution

The paper introduces Panda, a novel weakly supervised EM system with an IDE that simplifies and accelerates the creation of effective labeling functions for entity matching.

Findings

01

Panda's IDE significantly speeds up EM development.

02

Weak supervision with Panda achieves competitive accuracy.

03

The system reduces labeling effort and time.

Abstract

Entity matching (EM) refers to the problem of identifying tuple pairs in one or more relations that refer to the same real world entities. Supervised machine learning (ML) approaches, and deep learning based approaches in particular, typically achieve state-of-the-art matching results. However, these approaches require many labeled examples, in the form of matching and non-matching pairs, which are expensive and time-consuming to label. In this paper, we introduce Panda, a weakly supervised system specifically designed for EM. Panda uses the same labeling function abstraction as Snorkel, where labeling functions (LF) are user-provided programs that can generate large amounts of (somewhat noisy) labels quickly and cheaply, which can then be combined via a labeling model to generate accurate final predictions. To support users developing LFs for EM, Panda provides an integrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Advanced Database Systems and Queries · Topic Modeling