TL;DR
This paper introduces an iterative, LLM-driven method to refine category definitions for zero-shot web content classification, significantly enhancing accuracy without retraining models.
Contribution
It proposes a training-free, adaptive framework that optimizes category definitions iteratively using LLMs, improving zero-shot classification performance.
Findings
Iterative refinement improves classification accuracy across models.
The approach reduces semantic overlap caused by ambiguous definitions.
A new benchmark dataset with 10 URL categories and 1,000 samples per class is introduced.
Abstract
Web filtering systems rely on accurate web content classification to block cyber threats, prevent data exfiltration, and ensure compliance. However, classification is increasingly difficult due to the dynamic and rapidly evolving nature of the modern web. Embedding-based zero-shot approaches map content and category descriptions into a shared semantic space, enabling label assignment without labeled training data, but remain highly sensitive to definition quality. Poorly specified or ambiguous definitions create semantic overlap in the embedding space, leading to systematic misclassification. In this paper, we propose a training-free, adaptive iterative definition refinement framework that improves zero-shot web content classification by progressively optimizing category definitions rather than updating model parameters. Using LLMs as feedback-driven definition optimizers, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
