Optimizing Temperature for Language Models with Multi-Sample Inference
Weihua Du, Yiming Yang, Sean Welleck

TL;DR
This paper introduces an entropy-based method for automatically selecting optimal temperature settings in large language models, improving multi-sample aggregation without needing labeled validation data.
Contribution
It proposes a novel entropy-based metric and stochastic process model for automatic temperature tuning, enhancing model performance across various architectures and tasks.
Findings
Entropy-based metric outperforms fixed-temperature baselines
Method works across different model sizes and datasets
Provides interpretability through stochastic process modeling
Abstract
Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are widely used in contemporary large language models (LLMs) to enhance predictive accuracy across various tasks. A key challenge in this process is temperature selection, which significantly impacts model performance. Existing approaches either rely on a fixed default temperature or require labeled validation data for tuning, which are often scarce and difficult to obtain. This paper addresses the challenge of automatically identifying the (near)-optimal temperature for different LLMs using multi-sample aggregation strategies, without relying on task-specific validation data. We provide a comprehensive analysis of temperature's role in performance optimization, considering variations in model architectures, datasets, task types, model sizes, and predictive accuracy. Furthermore, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
