Class-Aware Contrastive Optimization for Imbalanced Text Classification

Grigorii Khvatskii; Nuno Moniz; Khoa Doan; Nitesh V Chawla

arXiv:2410.22197·cs.CL·October 30, 2024

Class-Aware Contrastive Optimization for Imbalanced Text Classification

Grigorii Khvatskii, Nuno Moniz, Khoa Doan, Nitesh V Chawla

PDF

Open Access

TL;DR

This paper introduces a class-aware contrastive optimization method combined with autoencoders to improve imbalanced text classification, outperforming existing approaches by enhancing class separation in embeddings.

Contribution

The paper proposes a novel combination of contrastive loss and autoencoder reconstruction to better handle class imbalance in text classification tasks.

Findings

01

Significant performance improvement over state-of-the-art methods

02

Effective class separation in embedding space

03

Robust across various text datasets

Abstract

The unique characteristics of text data make classification tasks a complex problem. Advances in unsupervised and semi-supervised learning and autoencoder architectures addressed several challenges. However, they still struggle with imbalanced text classification tasks, a common scenario in real-world applications, demonstrating a tendency to produce embeddings with unfavorable properties, such as class overlap. In this paper, we show that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the current state-of-the-art. Concretely, our proposal combines reconstruction loss with contrastive class separation in the embedding space, allowing a better balance between the truthfulness of the generated embeddings and the model's ability to separate different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Text and Document Classification Technologies · Imbalanced Data Classification Techniques

MethodsSparse Evolutionary Training