ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning
Liz Li, Wei Zhu

TL;DR
This paper introduces ACL, a novel contrastive learning framework for supervised BERT fine-tuning that resolves conflicts with cross-entropy loss and improves performance, especially for multi-exit BERT models in low-latency tasks.
Contribution
The paper proposes ACL, a new contrastive learning approach that aligns label embeddings with sample representations and enhances multi-exit BERT fine-tuning.
Findings
ACL outperforms or matches baseline methods on GLUE tasks.
ACL significantly improves multi-exit BERT performance.
ACL provides better speed-quality tradeoffs for low-latency applications.
Abstract
Despite its success in self-supervised learning, contrastive learning is less studied in the supervised setting. In this work, we first use a set of pilot experiments to show that in the supervised setting, the cross-entropy loss objective (CE) and the contrastive learning objective often conflict with each other, thus hindering the applications of CL in supervised settings. To resolve this problem, we introduce a novel \underline{A}ligned \underline{C}ontrastive \underline{L}earning (ACL) framework. First, ACL-Embed regards label embeddings as extra augmented samples with different labels and employs contrastive learning to align the label embeddings with its samples' representations. Second, to facilitate the optimization of ACL-Embed objective combined with the CE loss, we propose ACL-Grad, which will discard the ACL-Embed term if the two objectives are in conflict. To further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Mobile Crowdsensing and Crowdsourcing
