ScriptoriumWS: A Code Generation Assistant for Weak Supervision

Tzu-Heng Huang; Catherine Cao; Spencer Schoenberg; Harit Vishwakarma,; Nicholas Roberts; Frederic Sala

arXiv:2502.12366·cs.LG·February 19, 2025

ScriptoriumWS: A Code Generation Assistant for Weak Supervision

Tzu-Heng Huang, Catherine Cao, Spencer Schoenberg, Harit Vishwakarma,, Nicholas Roberts, Frederic Sala

PDF

Open Access

TL;DR

ScriptoriumWS leverages code-generation models as assistants to craft weak supervision sources, enhancing coverage while maintaining accuracy, thus addressing the bottleneck of obtaining high-quality labeling functions.

Contribution

The paper introduces ScriptoriumWS, a system that uses code-generation models to assist in creating weak supervision sources, improving coverage without sacrificing accuracy.

Findings

01

Maintains accuracy compared to hand-crafted sources

02

Greatly improves coverage of weak supervision sources

03

Effective prompting strategies enhance source quality

Abstract

Weak supervision is a popular framework for overcoming the labeled data bottleneck: the need to obtain labels for training data. In weak supervision, multiple noisy-but-cheap sources are used to provide guesses of the label and are aggregated to produce high-quality pseudolabels. These sources are often expressed as small programs written by domain experts -- and so are expensive to obtain. Instead, we argue for using code-generation models to act as coding assistants for crafting weak supervision sources. We study prompting strategies to maximize the quality of the generated sources, settling on a multi-tier strategy that incorporates multiple types of information. We explore how to best combine hand-written and generated sources. Using these insights, we introduce ScriptoriumWS, a weak supervision system that, when compared to hand-crafted sources, maintains accuracy and greatly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsService-Oriented Architecture and Web Services