Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification

Mengyu Li; Yonghao Liu; Fausto Giunchiglia; Ximing Li; Xiaoyue Feng; Renchu Guan

arXiv:2405.11524·cs.CL·January 26, 2026·1 cites

Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification

Mengyu Li, Yonghao Liu, Fausto Giunchiglia, Ximing Li, Xiaoyue Feng, Renchu Guan

PDF

Open Access

TL;DR

This paper introduces SharpReCL, a novel approach that uses class prototypes and balanced sampling to improve supervised contrastive learning for imbalanced text classification, outperforming large language models.

Contribution

The paper proposes a prototype-based balanced sampling method to enhance contrastive learning in imbalanced text classification tasks.

Findings

01

SharpReCL outperforms existing models on multiple datasets.

02

The method effectively handles data imbalance in contrastive learning.

03

Results surpass those of popular large language models.

Abstract

Text classification is a crucial and fundamental task in web content mining. Compared with the previous learning paradigm of pre-training and fine-tuning by cross entropy loss, the recently proposed supervised contrastive learning approach has received tremendous attention due to its powerful feature learning capability and robustness. Although several studies have incorporated this technique for text classification, some limitations remain. First, many text datasets are imbalanced, and the learning mechanism of supervised contrastive learning is sensitive to data imbalance, which may harm the model's performance. Moreover, these models leverage separate classification branches with cross entropy and supervised contrastive learning branches without explicit mutual guidance. To this end, we propose a novel model named SharpReCL for imbalanced text classification tasks. First, we obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies

MethodsSparse Evolutionary Training · Contrastive Learning