Can Large Language Models Design Accurate Label Functions?

Naiqing Guan; Kaiwen Chen; Nick Koudas

arXiv:2311.00739·cs.CL·November 3, 2023·1 cites

Can Large Language Models Design Accurate Label Functions?

Naiqing Guan, Kaiwen Chen, Nick Koudas

PDF

Open Access 1 Repo

TL;DR

This paper introduces DataSculpt, an interactive framework leveraging pre-trained language models to automate the creation of label functions for weak supervision, evaluated across diverse real-world datasets.

Contribution

It presents a novel framework that combines prompting, instance selection, and filtering to enable PLMs to generate accurate label functions autonomously.

Findings

01

PLMs show promise in automating label function design.

02

DataSculpt outperforms baseline methods on several datasets.

03

Limitations of PLMs in LF accuracy are identified.

Abstract

Programmatic weak supervision methodologies facilitate the expedited labeling of extensive datasets through the use of label functions (LFs) that encapsulate heuristic data sources. Nonetheless, the creation of precise LFs necessitates domain expertise and substantial endeavors. Recent advances in pre-trained language models (PLMs) have exhibited substantial potential across diverse tasks. However, the capacity of PLMs to autonomously formulate accurate LFs remains an underexplored domain. In this research, we address this gap by introducing DataSculpt, an interactive framework that harnesses PLMs for the automated generation of LFs. Within DataSculpt, we incorporate an array of prompting techniques, instance selection strategies, and LF filtration methods to explore the expansive design landscape. Ultimately, we conduct a thorough assessment of DataSculpt's performance on 12 real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gnaiqing/llmdp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling