Performance-Efficiency Trade-Offs in Adapting Language Models to Text   Classification Tasks

Laura Aina; Nikos Voskarides; Roi Blanco

arXiv:2210.12022·cs.CL·October 24, 2022

Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks

Laura Aina, Nikos Voskarides, Roi Blanco

PDF

Open Access

TL;DR

This paper compares various training procedures for adapting large language models to text classification, highlighting trade-offs between performance and efficiency, and proposing combined prompting and knowledge distillation as a cost-effective approach.

Contribution

It systematically evaluates fine-tuning, prompting, and knowledge distillation, revealing that prompting with KD offers a more efficient adaptation method for large language models.

Findings

01

Fine-tuning and prompting perform well on large datasets.

02

Prompting with knowledge distillation reduces compute and data costs.

03

Alternative training methods can be more efficient without sacrificing accuracy.

Abstract

Pre-trained language models (LMs) obtain state-of-the-art performance when adapted to text classification tasks. However, when using such models in real-world applications, efficiency considerations are paramount. In this paper, we study how different training procedures that adapt LMs to text classification perform, as we vary model and train set size. More specifically, we compare standard fine-tuning, prompting, and knowledge distillation (KD) when the teacher was trained with either fine-tuning or prompting. Our findings suggest that even though fine-tuning and prompting work well to train large LMs on large train sets, there are more efficient alternatives that can reduce compute or data cost. Interestingly, we find that prompting combined with KD can reduce compute and data cost at the same time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsKnowledge Distillation