Semi-supervised Text Categorization Using Recursive K-means Clustering

Harsha S. Gowda; Mahamad Suhil; D.S. Guru; and Lavanya Narayana Raju

arXiv:1706.07913·cs.LG·June 27, 2017

Semi-supervised Text Categorization Using Recursive K-means Clustering

Harsha S. Gowda, Mahamad Suhil, D.S. Guru, and Lavanya Narayana Raju

PDF

TL;DR

This paper introduces a semi-supervised text classification method that uses recursive K-means clustering to effectively label unlabeled documents and improve classification accuracy.

Contribution

It presents a novel recursive K-means based semi-supervised learning algorithm that enhances text categorization by better utilizing unlabeled data.

Findings

01

Outperforms recent state-of-the-art models on 20Newsgroups dataset

02

Effective labeling of unlabeled documents improves classification accuracy

03

Recursive clustering achieves desired class-specific partitions

Abstract

In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy. It uses recursive K-means algorithm for partitioning both labeled and unlabeled data collection. The K-means algorithm is applied recursively on each partition till a desired level partition is achieved such that each partition contains labeled documents of a single class. Once the desired clusters are obtained, the respective cluster centroids are considered as representatives of the clusters and the nearest neighbor rule is used for classifying an unknown text document. Series of experiments have been conducted to bring out the superiority of the proposed model over other recent state of the art models on 20Newsgroups dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.