Information-Theoretic Perspectives on Optimizers

Zhiquan Tan; Weiran Huang

arXiv:2502.20763·cs.LG·March 3, 2025

Information-Theoretic Perspectives on Optimizers

Zhiquan Tan, Weiran Huang

PDF

TL;DR

This paper introduces information-theoretic metrics like entropy gap to better understand optimizer and architecture interactions in neural networks, revealing their impact on performance and guiding improvements.

Contribution

It proposes the entropy gap as a new metric to analyze optimizer-architecture interplay and uses information theory to enhance the Lion optimizer.

Findings

01

Sharpness and entropy gap influence optimization and generalization.

02

Entropy gap provides better insights than sharpness alone.

03

Methods to improve Lion optimizer based on information theory.

Abstract

The interplay of optimizers and architectures in neural networks is complicated and hard to understand why some optimizers work better on some specific architectures. In this paper, we find that the traditionally used sharpness metric does not fully explain the intricate interplay and introduces information-theoretic metrics called entropy gap to better help analyze. It is found that both sharpness and entropy gap affect the performance, including the optimization dynamic and generalization. We further use information-theoretic tools to understand a recently proposed optimizer called Lion and find ways to improve it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.