Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning
Witold Sosnowski, Karolina Seweryn, Anna Wr\'oblewska, Piotr Gawrysiak

TL;DR
This paper investigates how Distance Metric Learning loss functions affect the performance of supervised language model fine-tuning in few-shot classification tasks, showing improvements over standard loss functions.
Contribution
It demonstrates that DML loss functions, especially SoftTriple, enhance few-shot classification performance of RoBERTa-large models and provides explainability analysis of these models.
Findings
DML loss functions improve downstream task performance.
SoftTriple loss outperforms cross-entropy by up to 13.48 percentage points.
Explainability techniques help assess model reliability.
Abstract
This paper presents an analysis regarding an influence of the Distance Metric Learning (DML) loss functions on the supervised fine-tuning of the language models for classification tasks. We experimented with known datasets from SentEval Transfer Tasks. Our experiments show that applying the DML loss function can increase performance on downstream classification tasks of RoBERTa-large models in few-shot scenarios. Models fine-tuned with the use of SoftTriple loss can achieve better results than models with a standard categorical cross-entropy loss function by about 2.89 percentage points from 0.04 to 13.48 percentage points depending on the training dataset. Additionally, we accomplished a comprehensive analysis with explainability techniques to assess the models' reliability and explain their results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
