Supervised Contrastive Learning as Multi-Objective Optimization for Fine-Tuning Large Pre-trained Language Models
Youness Moukafih, Mounir Ghogho, Kamel Smaili

TL;DR
This paper formulates supervised contrastive learning for fine-tuning large language models as a multi-objective optimization problem, proposing solutions that improve performance on benchmark tasks without additional data techniques.
Contribution
It introduces a multi-objective optimization framework for supervised contrastive learning during fine-tuning of large language models, employing novel solution methods for better trade-off management.
Findings
Significant performance improvements on GLUE benchmarks.
Effective multi-objective optimization methods for contrastive learning.
No need for data augmentation or adversarial examples.
Abstract
Recently, Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks. In SCL, a neural network is trained to optimize two objectives: pull an anchor and positive samples together in the embedding space, and push the anchor apart from the negatives. However, these two different objectives may conflict, requiring trade-offs between them during optimization. In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model. Two methods are utilized to solve the optimization problem: (i) the linear scalarization (LS) method, which minimizes a weighted linear combination of pertask losses; and (ii) the Exact Pareto Optimal (EPO) method which finds the intersection of the Pareto front with a given preference vector. We evaluate our approach on several GLUE benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Weight Decay · Attention Dropout · Dense Connections · WordPiece · Layer Normalization
