Towards Open-Domain Topic Classification
Hantian Ding, Jinrui Yang, Yuqian Deng, Hongming Zhang, Dan Roth

TL;DR
This paper presents a flexible, zero-shot open-domain topic classification system that allows real-time user-defined labels, leveraging a pretrained language model trained on Wikipedia data to handle unseen labels effectively.
Contribution
It introduces a novel zero-shot classification approach capable of real-time user-defined taxonomy classification in open domains, trained on a new Wikipedia-based dataset.
Findings
Significant improvement over existing zero-shot baselines
Competitive performance with weakly-supervised in-domain models
Effective handling of unseen labels in diverse datasets
Abstract
We introduce an open-domain topic classification system that accepts user-defined taxonomy in real time. Users will be able to classify a text snippet with respect to any candidate labels they want, and get instant response from our web interface. To obtain such flexibility, we build the backend model in a zero-shot way. By training on a new dataset constructed from Wikipedia, our label-aware text classifier can effectively utilize implicit knowledge in the pretrained language model to handle labels it has never seen before. We evaluate our model across four datasets from various domains with different label sets. Experiments show that the model significantly improves over existing zero-shot baselines in open-domain scenarios, and performs competitively with weakly-supervised models trained on in-domain data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
