ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning

Liz Li; Wei Zhu

arXiv:2602.03563·cs.CL·February 13, 2026

ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning

Liz Li, Wei Zhu

PDF

Open Access

TL;DR

This paper introduces ACL, a novel contrastive learning framework for supervised BERT fine-tuning that resolves conflicts with cross-entropy loss and improves performance, especially for multi-exit BERT models in low-latency tasks.

Contribution

The paper proposes ACL, a new contrastive learning approach that aligns label embeddings with sample representations and enhances multi-exit BERT fine-tuning.

Findings

01

ACL outperforms or matches baseline methods on GLUE tasks.

02

ACL significantly improves multi-exit BERT performance.

03

ACL provides better speed-quality tradeoffs for low-latency applications.

Abstract

Despite its success in self-supervised learning, contrastive learning is less studied in the supervised setting. In this work, we first use a set of pilot experiments to show that in the supervised setting, the cross-entropy loss objective (CE) and the contrastive learning objective often conflict with each other, thus hindering the applications of CL in supervised settings. To resolve this problem, we introduce a novel \underline{A}ligned \underline{C}ontrastive \underline{L}earning (ACL) framework. First, ACL-Embed regards label embeddings as extra augmented samples with different labels and employs contrastive learning to align the label embeddings with its samples' representations. Second, to facilitate the optimization of ACL-Embed objective combined with the CE loss, we propose ACL-Grad, which will discard the ACL-Embed term if the two objectives are in conflict. To further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Mobile Crowdsensing and Crowdsourcing