Large Language Model as a Teacher for Zero-shot Tagging at Extreme Scales
Jinbin Zhang, Nasib Ullah, Rohit Babbar

TL;DR
This paper presents LMTX, a framework that uses large language models to generate high-quality pseudo labels for zero-shot extreme multi-label classification, combining the accuracy of LLMs with the efficiency of lightweight bi-encoders.
Contribution
LMTX introduces a novel training approach where LLMs serve as teachers to improve pseudo label quality, enabling efficient inference without LLMs at test time.
Findings
LMTX outperforms existing methods in accuracy and efficiency.
Achieves state-of-the-art results in EZ-XMC tasks.
Eliminates the need for LLMs during inference, reducing computational costs.
Abstract
Extreme Multi-label Text Classification (XMC) entails selecting the most relevant labels for an instance from a vast label set. Extreme Zero-shot XMC (EZ-XMC) extends this challenge by operating without annotated data, relying only on raw text instances and a predefined label set, making it particularly critical for addressing cold-start problems in large-scale recommendation and categorization systems. State-of-the-art methods, such as MACLR and RTS, leverage lightweight bi-encoders but rely on suboptimal pseudo labels for training, such as document titles (MACLR) or document segments (RTS), which may not align well with the intended tagging or categorization tasks. On the other hand, LLM-based approaches, like ICXML, achieve better label-instance alignment but are computationally expensive and impractical for real-world EZ-XMC applications due to their heavy inference costs. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training
