Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
Stephen H. Bach, Daniel Rodriguez, Yintao Liu, Chong Luo, Haidong, Shao, Cassandra Xia, Souvik Sen, Alexander Ratner, Braden Hancock, Houman, Alborzi, Rahul Kuchhal, Christopher R\'e, Rob Malkin

TL;DR
This paper introduces Snorkel DryBell, a system that leverages organizational knowledge as weak supervision to significantly reduce labeling costs and development time in machine learning applications, demonstrating comparable performance to traditional methods at Google.
Contribution
It extends the Snorkel framework with flexible knowledge ingestion, cross-feature serving, and scalable execution, enabling practical deployment of weak supervision at industrial scale.
Findings
Achieves classifier quality comparable to large labeled datasets.
Improves performance of non-servable resources by 52%.
Processes millions of data points in tens of minutes.
Abstract
Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting. Snorkel DryBell builds on the Snorkel framework, extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution. On three classification tasks at Google, we find that Snorkel DryBell creates classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, converts non-servable organizational resources to servable models for an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Explainable Artificial Intelligence (XAI) · Software Engineering Research
