Distance Metric Learning Loss Functions in Few-Shot Scenarios of   Supervised Language Models Fine-Tuning

Witold Sosnowski; Karolina Seweryn; Anna Wr\'oblewska; Piotr Gawrysiak

arXiv:2211.15195·cs.CL·November 29, 2022

Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning

Witold Sosnowski, Karolina Seweryn, Anna Wr\'oblewska, Piotr Gawrysiak

PDF

Open Access

TL;DR

This paper investigates how Distance Metric Learning loss functions affect the performance of supervised language model fine-tuning in few-shot classification tasks, showing improvements over standard loss functions.

Contribution

It demonstrates that DML loss functions, especially SoftTriple, enhance few-shot classification performance of RoBERTa-large models and provides explainability analysis of these models.

Findings

01

DML loss functions improve downstream task performance.

02

SoftTriple loss outperforms cross-entropy by up to 13.48 percentage points.

03

Explainability techniques help assess model reliability.

Abstract

This paper presents an analysis regarding an influence of the Distance Metric Learning (DML) loss functions on the supervised fine-tuning of the language models for classification tasks. We experimented with known datasets from SentEval Transfer Tasks. Our experiments show that applying the DML loss function can increase performance on downstream classification tasks of RoBERTa-large models in few-shot scenarios. Models fine-tuned with the use of SoftTriple loss can achieve better results than models with a standard categorical cross-entropy loss function by about 2.89 percentage points from 0.04 to 13.48 percentage points depending on the training dataset. Additionally, we accomplished a comprehensive analysis with explainability techniques to assess the models' reliability and explain their results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications