Confident Adaptive Language Modeling

Tal Schuster; Adam Fisch; Jai Gupta; Mostafa Dehghani; Dara Bahri,; Vinh Q. Tran; Yi Tay; Donald Metzler

arXiv:2207.07061·cs.CL·October 26, 2022·39 cites

Confident Adaptive Language Modeling

Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri,, Vinh Q. Tran, Yi Tay, Donald Metzler

PDF

Open Access 1 Video

TL;DR

CALM is a framework that dynamically adjusts compute during language model inference by early exiting based on confidence, significantly reducing computational costs while maintaining high performance across diverse tasks.

Contribution

This work introduces CALM, a novel method for adaptive compute allocation in language models through confidence-based early exits, addressing key challenges in implementation.

Findings

01

Potential speedup of up to 3x in inference

02

Maintains high performance with reduced compute

03

Effective across diverse text generation tasks

Abstract

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models' size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with reduced compute. In this work, we introduce Confident Adaptive Language Modeling (CALM), a framework for dynamically allocating different amounts of compute per input and generation timestep. Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Confident Adaptive Language Modeling· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsEarly exiting using confidence measures