Distillation of encoder-decoder transformers for sequence labelling

Marco Farina; Duccio Pappadopulo; Anant Gupta; Leslie Huang; Ozan; \.Irsoy; Thamar Solorio

arXiv:2302.05454·cs.CL·February 14, 2023

Distillation of encoder-decoder transformers for sequence labelling

Marco Farina, Duccio Pappadopulo, Anant Gupta, Leslie Huang, Ozan, \.Irsoy, Thamar Solorio

PDF

Open Access

TL;DR

This paper introduces a hallucination-free distillation framework for sequence labeling that achieves state-of-the-art results and is effective in few-shot learning scenarios, making large models more practical for NLP tasks.

Contribution

It proposes a novel distillation framework specifically designed for sequence tagging that improves efficiency and performance without hallucinations.

Findings

01

Achieves new state-of-the-art on multiple datasets

02

Effective in few-shot learning scenarios

03

Reduces hallucinations in distilled models

Abstract

Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Bioinformatics