Demonstration of Panda: A Weakly Supervised Entity Matching System
Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, Yeye He

TL;DR
Panda is a weakly supervised entity matching system that leverages user-defined labeling functions and an integrated IDE to efficiently develop high-quality EM solutions without extensive labeled data.
Contribution
The paper introduces Panda, a novel weakly supervised EM system with an IDE that simplifies and accelerates the creation of effective labeling functions for entity matching.
Findings
Panda's IDE significantly speeds up EM development.
Weak supervision with Panda achieves competitive accuracy.
The system reduces labeling effort and time.
Abstract
Entity matching (EM) refers to the problem of identifying tuple pairs in one or more relations that refer to the same real world entities. Supervised machine learning (ML) approaches, and deep learning based approaches in particular, typically achieve state-of-the-art matching results. However, these approaches require many labeled examples, in the form of matching and non-matching pairs, which are expensive and time-consuming to label. In this paper, we introduce Panda, a weakly supervised system specifically designed for EM. Panda uses the same labeling function abstraction as Snorkel, where labeling functions (LF) are user-provided programs that can generate large amounts of (somewhat noisy) labels quickly and cheaply, which can then be combined via a labeling model to generate accurate final predictions. To support users developing LFs for EM, Panda provides an integrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Topic Modeling
