The Application of Active Query K-Means in Text Classification

Yukun Jiang

arXiv:2107.07682·cs.CL·July 19, 2021

The Application of Active Query K-Means in Text Classification

Yukun Jiang

PDF

Open Access

TL;DR

This paper introduces an active query K-Means algorithm for text classification that improves accuracy and reduces labeling costs by combining semi-supervised clustering with active learning techniques.

Contribution

It extends traditional K-Means into a semi-supervised and active learning framework using Penalized Min-Max-selection for more efficient text classification.

Findings

01

Increased classification accuracy on Chinese news dataset

02

Reduced labeling costs in active learning process

03

Stable initial centroids through penalized selection

Abstract

Active learning is a state-of-art machine learning approach to deal with an abundance of unlabeled data. In the field of Natural Language Processing, typically it is costly and time-consuming to have all the data annotated. This inefficiency inspires out our application of active learning in text classification. Traditional unsupervised k-means clustering is first modified into a semi-supervised version in this research. Then, a novel attempt is applied to further extend the algorithm into active learning scenario with Penalized Min-Max-selection, so as to make limited queries that yield more stable initial centroids. This method utilizes both the interactive query results from users and the underlying distance representation. After tested on a Chinese news dataset, it shows a consistent increase in accuracy while lowering the cost in training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Text and Document Classification Technologies · Algorithms and Data Compression

Methodsk-Means Clustering