ScriptoriumWS: A Code Generation Assistant for Weak Supervision
Tzu-Heng Huang, Catherine Cao, Spencer Schoenberg, Harit Vishwakarma,, Nicholas Roberts, Frederic Sala

TL;DR
ScriptoriumWS leverages code-generation models as assistants to craft weak supervision sources, enhancing coverage while maintaining accuracy, thus addressing the bottleneck of obtaining high-quality labeling functions.
Contribution
The paper introduces ScriptoriumWS, a system that uses code-generation models to assist in creating weak supervision sources, improving coverage without sacrificing accuracy.
Findings
Maintains accuracy compared to hand-crafted sources
Greatly improves coverage of weak supervision sources
Effective prompting strategies enhance source quality
Abstract
Weak supervision is a popular framework for overcoming the labeled data bottleneck: the need to obtain labels for training data. In weak supervision, multiple noisy-but-cheap sources are used to provide guesses of the label and are aggregated to produce high-quality pseudolabels. These sources are often expressed as small programs written by domain experts -- and so are expensive to obtain. Instead, we argue for using code-generation models to act as coding assistants for crafting weak supervision sources. We study prompting strategies to maximize the quality of the generated sources, settling on a multi-tier strategy that incorporates multiple types of information. We explore how to best combine hand-written and generated sources. Using these insights, we introduce ScriptoriumWS, a weak supervision system that, when compared to hand-crafted sources, maintains accuracy and greatly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services
