Embedding Recycling for Language Models

Jon Saad-Falcon; Amanpreet Singh; Luca Soldaini; Mike D'Arcy; Arman; Cohan; Doug Downey

arXiv:2207.04993·cs.CL·February 1, 2023·1 cites

Embedding Recycling for Language Models

Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman, Cohan, Doug Downey

PDF

Open Access 1 Repo

TL;DR

Embedding recycling (ER) leverages cached model activations to significantly speed up training and inference across multiple language models and tasks with minimal accuracy loss.

Contribution

This paper provides the first extensive evaluation of ER techniques across diverse models and tasks, demonstrating their practical effectiveness and potential for speed improvements.

Findings

01

Over 90% training speedup with minimal accuracy impact

02

Effective ER across models from 17M to 900M parameters

03

Identifies key areas for future research in ER methods

Abstract

Real-world applications of neural language models often involve running many different models over the same corpus. The high computational cost of these runs has led to interest in techniques that can reuse the contextualized embeddings produced in previous runs to speed training and inference of future ones. We refer to this approach as embedding recycling (ER). While multiple ER techniques have been proposed, their practical effectiveness is still unknown because existing evaluations consider very few models and do not adequately account for overhead costs. We perform an extensive evaluation of ER across eight different models (17 to 900 million parameters) and fourteen tasks in English. We show how a simple ER technique that caches activations from an intermediate layer of a pretrained model, and learns task-specific adapters on the later layers, is broadly effective. For the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/embeddingrecycling
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings