Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Jonas Golde, Patrick Haller, Felix Hamborg, Julian Risch, Alan Akbik

TL;DR
Fabricator is an open-source toolkit that simplifies the process of generating labeled training data for NLP tasks using large language models, enabling efficient dataset creation for training smaller models.
Contribution
It introduces a versatile, easy-to-use toolkit that supports multiple NLP tasks and integrates with existing libraries to facilitate reproducible dataset generation with LLMs.
Findings
Supports various NLP tasks like classification, QA, and entity recognition.
Facilitates quick experimentation with dataset generation workflows.
Aims to improve reproducibility and accessibility in dataset creation.
Abstract
Most NLP tasks are modeled as supervised learning and thus require labeled training data to train effective models. However, manually producing such data at sufficient quality and quantity is known to be costly and time-intensive. Current research addresses this bottleneck by exploring a novel paradigm called zero-shot learning via dataset generation. Here, a powerful LLM is prompted with a task description to generate labeled data that can be used to train a downstream NLP model. For instance, an LLM might be prompted to "generate 500 movie reviews with positive overall sentiment, and another 500 with negative sentiment." The generated data could then be used to train a binary sentiment classifier, effectively leveraging an LLM as a teacher to a smaller student model. With this demo, we introduce Fabricator, an open-source Python toolkit for dataset generation. Fabricator implements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
