Incorporating Word Embeddings into Open Directory Project based Large-scale Classification
Kang-Min Kim, Aliyeva Dinara, Byung-Ju Choi, SangKeun Lee

TL;DR
This paper enhances large-scale text classification by integrating word embeddings with the Open Directory Project, creating semantic category representations that significantly improve classification performance.
Contribution
It introduces a novel method to incorporate word embeddings into ODP-based classification, generating category vectors and a new similarity measure for improved accuracy.
Findings
10% improvement in macro-averaged F1-score
28% increase in precision at k
Effective large-scale classification performance enhancement
Abstract
Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, these approaches are limited to small- or moderate-scale text classification. Explicit representation models are often used in a large-scale text classification, like the Open Directory Project (ODP)-based text classification. However, the performance of these models is limited to the associated knowledge bases. In this paper, we incorporate word embeddings into the ODP-based large-scale classification. To this end, we first generate category vectors, which represent the semantics of ODP categories by jointly modeling word embeddings and the ODP-based text classification. We then propose a novel semantic similarity measure, which utilizes the category and word vectors obtained from the joint model and word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
