Information-Theoretic Perspectives on Optimizers
Zhiquan Tan, Weiran Huang

TL;DR
This paper introduces information-theoretic metrics like entropy gap to better understand optimizer and architecture interactions in neural networks, revealing their impact on performance and guiding improvements.
Contribution
It proposes the entropy gap as a new metric to analyze optimizer-architecture interplay and uses information theory to enhance the Lion optimizer.
Findings
Sharpness and entropy gap influence optimization and generalization.
Entropy gap provides better insights than sharpness alone.
Methods to improve Lion optimizer based on information theory.
Abstract
The interplay of optimizers and architectures in neural networks is complicated and hard to understand why some optimizers work better on some specific architectures. In this paper, we find that the traditionally used sharpness metric does not fully explain the intricate interplay and introduces information-theoretic metrics called entropy gap to better help analyze. It is found that both sharpness and entropy gap affect the performance, including the optimization dynamic and generalization. We further use information-theoretic tools to understand a recently proposed optimizer called Lion and find ways to improve it.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
