One-shot Entropy Minimization
Zitian Gao, Lynx Chen, Haoming Luo, Joey Zhou, Bryan Dai

TL;DR
This paper demonstrates that entropy minimization in large language models can be effectively achieved with just one unlabeled data point and 10 optimization steps, challenging traditional reinforcement learning approaches.
Contribution
It introduces a simple one-shot entropy minimization method that rivals complex reward-based training in large language models.
Findings
Entropy minimization with one data point yields significant performance gains.
Only 10 optimization steps are needed for effective entropy reduction.
The approach outperforms traditional reward-based reinforcement learning.
Abstract
We trained 13,440 large language models and found that entropy minimization requires only a single unlabeled data and 10 steps optimization to achieve performance improvements comparable to or even greater than those obtained using thousands of data and carefully designed rewards in rule-based reinforcement learning. This striking result may prompt a rethinking of post-training paradigms for large language models. Our code is avaliable at https://github.com/zitian-gao/one-shot-em.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Natural Language Processing Techniques
