A WL-SPPIM Semantic Model for Document Classification

Ming Li; Peilun Xiao; Ju Zhang

arXiv:1706.01758·cs.CL·June 7, 2017

A WL-SPPIM Semantic Model for Document Classification

Ming Li, Peilun Xiao, Ju Zhang

PDF

Open Access

TL;DR

This paper introduces the WL-SPPIM semantic model for document classification, demonstrating its superior performance and scalability over existing methods like LDA, SGNS, and SPPIM on standard datasets.

Contribution

The paper proposes a novel WL-SPPIM semantic model that improves classification accuracy and scalability compared to previous SPPIM-based and other models.

Findings

01

WL-SPPIM outperforms LDA, SGNS, and SPPIM in classification accuracy.

02

WL-SPPIM shows higher scalability in text classification tasks.

03

SPPIM is comparable or superior to SGNS in some datasets, but SGNS benefits from weight considerations.

Abstract

In this paper, we explore SPPIM-based text classification method, and the experiment reveals that the SPPIM method is equal to or even superior than SGNS method in text classification task on three international and standard text datasets, namely 20newsgroups, Reuters52 and WebKB. Comparing to SGNS, although SPPMI provides a better solution, it is not necessarily better than SGNS in text classification tasks. Based on our analysis, SGNS takes into the consideration of weight calculation during decomposition process, so it has better performance than SPPIM in some standard datasets. Inspired by this, we propose a WL-SPPIM semantic model based on SPPIM model, and experiment shows that WL-SPPIM approach has better classification and higher scalability in the text classification task compared with LDA, SGNS and SPPIM approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Topic Modeling · Advanced Text Analysis Techniques

MethodsLinear Discriminant Analysis