Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani; Idan Shenfeld; Andi Peng; Andreea Bobu; Jacob Andreas

arXiv:2410.04707·cs.LG·October 8, 2024

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu, Jacob Andreas

PDF

Open Access 3 Reviews

TL;DR

This paper introduces an input-adaptive method for allocating computational resources in language model decoding, improving efficiency or quality by dynamically adjusting the amount of computation based on input difficulty.

Contribution

It proposes a novel approach to predict reward distributions and allocate decoding resources adaptively, reducing computation or enhancing output quality across various tasks.

Findings

01

Reduced computation by up to 50% without quality loss

02

Improved output quality by up to 10% within fixed budgets

03

Effective in programming, mathematics, and dialog tasks

Abstract

Computationally intensive decoding procedures--including search, reranking, and self-critique--can improve the quality of language model (LM) outputs in problems spanning code generation, numerical reasoning, and dialog. Existing work typically applies the same decoding procedure for every input to an LM. But not all inputs require the same amount of computation to process. Can we allocate decoding computation adaptively, using more resources to answer questions whose answers will be harder to compute? We present an approach that predicts the distribution of rewards given an input and computation budget, then allocates additional computation to inputs for which it is predicted to be most useful. We apply this approach in two decoding procedures: first, an adaptive best-of-k procedure that dynamically selects the number of samples to generate as input to a reranker; second, a routing…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- The paper adeptly formulates the "adaptive computation scaling allocation" in the context of LM decoding, addressing a topic that is both timely and relevant. - The proposed computation-allocation framework is comprehensive, covering various cases and scenarios, including binary reward, pairwise optimization in routing, and both online and offline design considerations. - The experiments conducted on three diverse and representative domains demonstrate the efficiency and efficacy of the propos

Weaknesses

- The main concern is that the current computation-allocation solution is only evaluated in scenarios with identical distributions (i.e., the training data used to train the difficulty model comes from the same distribution as the test set). It is unclear whether the trained difficulty model generalizes to other distributions. The generalizability of the difficulty model is crucial for determining the practicality of the proposed computation-allocation framework. - Following from the above, sinc

Reviewer 02Rating 8Confidence 4

Strengths

The paper tackles a novel and timely problem, and offers a reasonable approach. The paper is clearly written.

Weaknesses

A small criticism is the naming convention of online versus offline. Online optimization refers to "optimization problems having no or incomplete knowledge of the future (online)," which is not how online is used in this paper. Other than that, this paper is a good step in improving adaptive test-time compute, identifying the importance of accurate utility estimation in problems with very low success rates.

Reviewer 03Rating 6Confidence 4

Strengths

1. Scaling the test-time compute is effective but costly, this work contributes to a timely direction with a smart input-adaptive allocation scheme improving test-time efficiency. 2. The empirical improvement in efficiency is noticeable, and this work has covered adaptive allocation in representative popular subdomains: sampling, model size, and decoding method. 3. The presented analysis in Figure 6 is intuitive.

Weaknesses

1. The selection of datasets and backbone language models may be questionable. I suspect this method should be ideally generalizable across tasks, however, only a single data in each domain is selected. I expect to see more tasks like HumanEval, MBPP for coding, Hendrycks MATH, and GSM for math. Meanwhile, for each domain, the author selects a specific backbone LM rather than the same choice across all tasks. This may raise concerns about the generalization of the proposed method. 2. The underl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing