Standing on the Shoulders of Giant Frozen Language Models

Yoav Levine; Itay Dalmedigos; Ori Ram; Yoel Zeldes; Daniel Jannai; Dor; Muhlgay; Yoni Osin; Opher Lieber; Barak Lenz; Shai Shalev-Shwartz; Amnon; Shashua; Kevin Leyton-Brown; Yoav Shoham

arXiv:2204.10019·cs.CL·April 22, 2022·20 cites

Standing on the Shoulders of Giant Frozen Language Models

Yoav Levine, Itay Dalmedigos, Ori Ram, Yoel Zeldes, Daniel Jannai, Dor, Muhlgay, Yoni Osin, Opher Lieber, Barak Lenz, Shai Shalev-Shwartz, Amnon, Shashua, Kevin Leyton-Brown, Yoav Shoham

PDF

Open Access

TL;DR

This paper introduces advanced techniques for leveraging large frozen language models, achieving performance comparable to fine-tuning without sacrificing model versatility, thus unlocking their untapped potential across various tasks.

Contribution

The paper presents three novel methods—input-dependent prompt tuning, frozen readers, and recursive LMs—that significantly enhance the capabilities of frozen language models.

Findings

01

Some methods outperform fine-tuning in certain domains.

02

Frozen models can match fine-tuning performance with new techniques.

03

The proposed methods are computationally efficient relative to model size.

Abstract

Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozen-model techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications