Improving Diversity in Language Models: When Temperature Fails, Change the Loss
Alexandre Verine, Florian Le Bronnec, Kunhao Zheng, Alexandre Allauzen, Yann Chevaleyre, Benjamin Negrevergne

TL;DR
This paper explores the limitations of using temperature adjustments to increase diversity in language models and proposes rethinking loss functions to better balance precision and recall.
Contribution
It introduces a novel approach to loss functions based on the Precision-Recall framework, improving the trade-off between diversity and quality in language models.
Findings
Temperature scaling alone often fails to improve coverage.
Training models toward coverage enhances tunability via temperature.
Revised loss functions outperform traditional methods in balancing precision and recall.
Abstract
Increasing diversity in language models is a challenging yet essential objective. A common approach is to raise the decoding temperature. In this work, we investigate this approach through a simplistic yet common case to provide insights into why decreasing temperature can improve quality (Precision), while increasing it often fails to boost coverage (Recall). Our analysis reveals that for a model to be effectively tunable through temperature adjustments, it must be trained toward coverage. To address this, we propose rethinking loss functions in language models by leveraging the Precision-Recall framework. Our results demonstrate that this approach achieves a substantially better trade-off between Precision and Recall than merely combining negative log-likelihood training with temperature scaling. These findings offer a pathway toward more versatile and robust language modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
