TnT-LLM: Text Mining at Scale with Large Language Models
Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott, Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi, Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

TL;DR
TnT-LLM leverages large language models to automate label taxonomy creation and data annotation for text mining, reducing manual effort and enabling scalable, accurate classification in real-world applications.
Contribution
The paper introduces TnT-LLM, a novel two-phase framework that uses LLMs for automated label generation and assignment, addressing the challenge of large-scale, low-supervision text mining.
Findings
TnT-LLM produces more accurate label taxonomies than state-of-the-art baselines.
The framework achieves a good balance between accuracy and efficiency in large-scale classification.
Experiments demonstrate the practical viability of LLM-based text mining in real-world scenarios.
Abstract
Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
