Incorporating Word Embeddings into Open Directory Project based   Large-scale Classification

Kang-Min Kim; Aliyeva Dinara; Byung-Ju Choi; SangKeun Lee

arXiv:1804.00828·cs.CL·April 4, 2018·1 cites

Incorporating Word Embeddings into Open Directory Project based Large-scale Classification

Kang-Min Kim, Aliyeva Dinara, Byung-Ju Choi, SangKeun Lee

PDF

Open Access

TL;DR

This paper enhances large-scale text classification by integrating word embeddings with the Open Directory Project, creating semantic category representations that significantly improve classification performance.

Contribution

It introduces a novel method to incorporate word embeddings into ODP-based classification, generating category vectors and a new similarity measure for improved accuracy.

Findings

01

10% improvement in macro-averaged F1-score

02

28% increase in precision at k

03

Effective large-scale classification performance enhancement

Abstract

Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, these approaches are limited to small- or moderate-scale text classification. Explicit representation models are often used in a large-scale text classification, like the Open Directory Project (ODP)-based text classification. However, the performance of these models is limited to the associated knowledge bases. In this paper, we incorporate word embeddings into the ODP-based large-scale classification. To this end, we first generate category vectors, which represent the semantics of ODP categories by jointly modeling word embeddings and the ODP-based text classification. We then propose a novel semantic similarity measure, which utilizes the category and word vectors obtained from the joint model and word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques