Out-of-Category Document Identification Using Target-Category Names as Weak Supervision
Dongha Lee, Dongmin Hyun, Jiawei Han, Hwanjo Yu

TL;DR
This paper introduces a novel out-of-category detection method that uses target-category names as weak supervision, enabling effective identification of documents relevant to specified categories without extensive labeled data.
Contribution
It proposes a two-step framework combining pseudo-label generation and neural classification to improve out-of-category detection using minimal supervision.
Findings
Achieves superior detection performance over baseline methods.
Effectively utilizes category names as weak supervision.
Demonstrates robustness across various target categories.
Abstract
Identifying outlier documents, whose content is different from the majority of the documents in a corpus, has played an important role to manage a large text collection. However, due to the absence of explicit information about the inlier (or target) distribution, existing unsupervised outlier detectors are likely to make unreliable results depending on the density or diversity of the outliers in the corpus. To address this challenge, we introduce a new task referred to as out-of-category detection, which aims to distinguish the documents according to their semantic relevance to the inlier (or target) categories by using the category names as weak supervision. In practice, this task can be widely applicable in that it can flexibly designate the scope of target categories according to users' interests while requiring only the target-category names as minimum guidance. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
