Dynamic learning rate using Mutual Information

Shrihari Vasudevan

arXiv:1805.07249·cs.LG·June 27, 2018·6 cites

Dynamic learning rate using Mutual Information

Shrihari Vasudevan

PDF

Open Access

TL;DR

This paper introduces a method to dynamically adjust the learning rate of deep neural networks during training by using Mutual Information between outputs and true labels, aiming to improve training efficiency and performance.

Contribution

It proposes a novel approach to set learning rates dynamically based on Mutual Information, extending the idea to layer-wise adaptation without prescribing a specific policy.

Findings

01

Mutual Information can effectively guide dynamic learning rate adjustment.

02

The method achieves competitive or superior results compared to traditional fixed or scheduled learning rates.

03

Dynamic MI-based adjustment can improve training efficiency and model performance.

Abstract

This paper demonstrates dynamic hyper-parameter setting, for deep neural network training, using Mutual Information (MI). The specific hyper-parameter studied in this paper is the learning rate. MI between the output layer and true outcomes is used to dynamically set the learning rate of the network through the training cycle; the idea is also extended to layer-wise setting of learning rate. Two approaches are demonstrated - tracking relative change in mutual information and, additionally tracking its value relative to a reference measure. The paper does not attempt to recommend a specific learning rate policy. Experiments demonstrate that mutual information may be effectively used to dynamically set learning rate and achieve competitive to better outcomes in competitive to better time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Adversarial Robustness in Machine Learning