One-shot Entropy Minimization

Zitian Gao; Lynx Chen; Haoming Luo; Joey Zhou; Bryan Dai

arXiv:2505.20282·cs.CL·August 22, 2025

One-shot Entropy Minimization

Zitian Gao, Lynx Chen, Haoming Luo, Joey Zhou, Bryan Dai

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper demonstrates that entropy minimization in large language models can be effectively achieved with just one unlabeled data point and 10 optimization steps, challenging traditional reinforcement learning approaches.

Contribution

It introduces a simple one-shot entropy minimization method that rivals complex reward-based training in large language models.

Findings

01

Entropy minimization with one data point yields significant performance gains.

02

Only 10 optimization steps are needed for effective entropy reduction.

03

The approach outperforms traditional reward-based reinforcement learning.

Abstract

We trained 13,440 large language models and found that entropy minimization requires only a single unlabeled data and 10 steps optimization to achieve performance improvements comparable to or even greater than those obtained using thousands of data and carefully designed rewards in rule-based reinforcement learning. This striking result may prompt a rethinking of post-training paradigms for large language models. Our code is avaliable at https://github.com/zitian-gao/one-shot-em.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zitian-gao/one-shot-em
pytorchOfficial

Models

🤗
zgao3186/qwen25math7b-one-shot-em
model· 4 dl· ♡ 1
4 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Natural Language Processing Techniques