Partial local entropy and anisotropy in deep weight spaces
Daniele Musso

TL;DR
This paper introduces partial local entropy loss functions that adapt to anisotropic weight spaces in deep neural networks, improving optimization and providing insights into the landscape shape and training dynamics.
Contribution
It proposes a novel class of partial local entropies that focus regularization on subsets of weights, enhancing the understanding and exploitation of anisotropic minima in deep learning.
Findings
Partial local entropies outperform isotropic versions in experiments.
The study reveals a common cooling behavior of layer temperatures at late training.
Insights into the shape of minima and the dynamics of stochastic gradient descent.
Abstract
We refine a recently-proposed class of local entropic loss functions by restricting the smoothening regularization to only a subset of weights. The new loss functions are referred to as partial local entropies. They can adapt to the weight-space anisotropy, thus outperforming their isotropic counterparts. We support the theoretical analysis with experiments on image classification tasks performed with multi-layer, fully-connected and convolutional neural networks. The present study suggests how to better exploit the anisotropic nature of deep landscapes and provides direct probes of the shape of the minima encountered by stochastic gradient descent algorithms. As a by-product, we observe an asymptotic dynamical regime at late training times where the temperature of all the layers obeys a common cooling behavior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
