Applying SoftTriple Loss for Supervised Language Model Fine Tuning

Witold Sosnowski; Anna Wroblewska; Piotr Gawrysiak

arXiv:2112.08462·cs.CL·November 28, 2022·1 cites

Applying SoftTriple Loss for Supervised Language Model Fine Tuning

Witold Sosnowski, Anna Wroblewska, Piotr Gawrysiak

PDF

Open Access

TL;DR

This paper proposes a new loss function, TripleEntropy, to enhance supervised fine-tuning of language models, showing consistent improvements especially on smaller datasets.

Contribution

Introduction of the TripleEntropy loss function that improves classification performance during language model fine-tuning over existing methods.

Findings

01

Steady performance gains across multiple datasets

02

Greater improvements observed with smaller training datasets

03

Enhanced robustness of fine-tuned models

Abstract

We introduce a new loss function TripleEntropy, to improve classification performance for fine-tuning general knowledge pre-trained language models based on cross-entropy and SoftTriple loss. This loss function can improve the robust RoBERTa baseline model fine-tuned with cross-entropy loss by about (0.02% - 2.29%). Thorough tests on popular datasets indicate a steady gain. The fewer samples in the training dataset, the higher gain -- thus, for small-sized dataset it is 0.78%, for medium-sized -- 0.86% for large -- 0.20% and for extra-large 0.04%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Linear Layer · Weight Decay · Attention Is All You Need · Dropout · Softmax · Linear Warmup With Linear Decay · Attention Dropout · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia?