Learning Only from Relevant Keywords and Unlabeled Documents
Nontawat Charoenphakdee, Jongyeong Lee, Yiping Jin, Dittaya Wanvarie,, Masashi Sugiyama

TL;DR
This paper introduces a new learning framework for document classification that operates solely on relevant keywords and unlabeled documents, providing theoretical guarantees and flexibility in model choice.
Contribution
It presents a simple, theoretically grounded approach to train classifiers using only keywords and unlabeled data, compatible with various models and optimization objectives.
Findings
Effective AUC optimization demonstrated on benchmark datasets
Framework adaptable to optimize accuracy and F1-measure
Flexible implementation with linear models or neural networks
Abstract
We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given. Although heuristic methods based on pseudo-labeling have been considered, theoretical understanding of this problem has still been limited. Moreover, previous methods cannot easily incorporate well-developed techniques in supervised text classification. In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and has flexible choices of models, e.g., linear models or neural networks. We demonstrate how to optimize the area under the receiver operating characteristic curve (AUC) effectively and also discuss how to adjust it to optimize other well-known evaluation metrics such as the accuracy and F1-measure. Finally, we show the effectiveness of our framework using benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Machine Learning and Algorithms
