Entropy-UID: A Method for Optimizing Information Density
Xinpeng Shou

TL;DR
Entropy-UID introduces an adaptive token selection method that balances entropy and Uniform Information Density principles, improving the efficiency and naturalness of language generation models.
Contribution
The paper proposes Entropy-UID, a novel token selection approach that jointly minimizes entropy and surprisal for better information distribution in text generation.
Findings
Achieves lower surprisal and entropy variance than baseline models.
Produces more balanced and human-like generated text.
Validated on multiple benchmark datasets with consistent improvements.
Abstract
Balanced and efficient information flow is essential for optimizing language generation models. In this work, we propose Entropy-UID, a new token selection method that balances entropy and Uniform Information Density (UID) principles for enhanced efficiency of text generation. Our approach adaptively adjusts token selection by jointly minimizing entropy and surprisal, promoting more even information distribution across generated sequences. Theoretical validation demonstrates that Entropy-UID optimally reduces information spikes while maintaining fluency and coherence. The method has been evulated using information-theoretic metrics on multiple benchmark datasets, including WikiText-2, OpenWebText, and WMT. Experimental results show that Entropy-UID achieves lower surprisal and entropy variance compared to standard GPT-2 and alternative heuristics, leading to more balanced and human-like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Dense Connections · Attention Dropout · Discriminative Fine-Tuning · Multi-Head Attention · Adam · Softmax
