LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models
Ruiming Tang, Chenxu Zhu, Bo Chen, Weipeng Zhang, Menghui Zhu, Xinyi, Dai, Huifeng Guo

TL;DR
LLM4Tag is an innovative automatic tagging system leveraging large language models, designed to improve tag relevance, adapt to new domains, and provide reliable confidence scores for enhanced information retrieval applications.
Contribution
The paper introduces a novel LLM-based tagging framework with a graph-based recall, knowledge-enhanced generation, and confidence calibration modules, addressing key limitations of existing methods.
Findings
Outperforms state-of-the-art baselines on large-scale datasets
Successfully deployed online for content tagging serving millions of users
Significantly improves tag relevance and confidence estimation
Abstract
Tagging systems play an essential role in various information retrieval applications such as search engines and recommender systems. Recently, Large Language Models (LLMs) have been applied in tagging systems due to their extensive world knowledge, semantic understanding, and reasoning capabilities. Despite achieving remarkable performance, existing methods still have limitations, including difficulties in retrieving relevant candidate tags comprehensively, challenges in adapting to emerging domain-specific knowledge, and the lack of reliable tag confidence quantification. To address these three limitations above, we propose an automatic tagging system LLM4Tag. First, a graph-based tag recall module is designed to effectively and comprehensively construct a small-scale highly relevant candidate tag set. Subsequently, a knowledge-enhanced tag generation module is employed to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
