Revisiting Distance Metric Learning for Few-Shot Natural Language   Classification

Witold Sosnowski; Anna Wr\'oblewska; Karolina Seweryn; Piotr Gawrysiak

arXiv:2211.15202·cs.CL·November 29, 2022

Revisiting Distance Metric Learning for Few-Shot Natural Language Classification

Witold Sosnowski, Anna Wr\'oblewska, Karolina Seweryn, Piotr Gawrysiak

PDF

Open Access

TL;DR

This paper explores the application of Distance Metric Learning (DML) to improve few-shot NLP classification by fine-tuning RoBERTa models, demonstrating that proxy-based DML losses enhance performance over traditional methods.

Contribution

It systematically evaluates DML loss functions, especially proxy-based losses, in few-shot NLP tasks and shows their effectiveness during both training and inference phases.

Findings

01

Proxy-based DML losses improve few-shot NLP classification performance.

02

Combining CCE with ProxyAnchor Loss yields the best results.

03

Models with combined loss outperform CCE-only models by up to 10.38 percentage points.

Abstract

Distance Metric Learning (DML) has attracted much attention in image processing in recent years. This paper analyzes its impact on supervised fine-tuning language models for Natural Language Processing (NLP) classification tasks under few-shot learning settings. We investigated several DML loss functions in training RoBERTa language models on known SentEval Transfer Tasks datasets. We also analyzed the possibility of using proxy-based DML losses during model inference. Our systematic experiments have shown that under few-shot learning settings, particularly proxy-based DML losses can positively affect the fine-tuning and inference of a supervised language model. Models tuned with a combination of CCE (categorical cross-entropy loss) and ProxyAnchor Loss have, on average, the best performance and outperform models with only CCE by about 3.27 percentage points -- up to 10.38 percentage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Residual Connection · Dense Connections · Layer Normalization · WordPiece · Linear Warmup With Linear Decay · Softmax