Text Classification and Clustering with Annealing Soft Nearest Neighbor   Loss

Abien Fred Agarap

arXiv:2107.14597·cs.LG·August 2, 2021·1 cites

Text Classification and Clustering with Annealing Soft Nearest Neighbor Loss

Abien Fred Agarap

PDF

Open Access

TL;DR

This paper introduces a novel disentanglement-based loss function called Annealing Soft Nearest Neighbor Loss, which improves text classification and clustering by enhancing feature space structure.

Contribution

It proposes a new loss function that maximizes disentanglement in feature representations, leading to better natural language understanding in classification and clustering tasks.

Findings

01

Achieved 90.11% classification accuracy on AG News.

02

Obtained 88% clustering accuracy, outperforming baseline models.

03

Improved natural language representations without additional regularization.

Abstract

We define disentanglement as how far class-different data points from each other are, relative to the distances among class-similar data points. When maximizing disentanglement during representation learning, we obtain a transformed feature representation where the class memberships of the data points are preserved. If the class memberships of the data points are preserved, we would have a feature representation space in which a nearest neighbour classifier or a clustering algorithm would perform well. We take advantage of this method to learn better natural language representation, and employ it on text classification and text clustering tasks. Through disentanglement, we obtain text representations with better-defined clusters and improve text classification performance. Our approach had a test classification accuracy of as high as 90.11% and test clustering accuracy of 88% on the AG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Topic Modeling · Web Data Mining and Analysis