Adaptive Hierarchical Hyper-gradient Descent

Renlong Jie; Junbin Gao; Andrey Vasnev; Minh-Ngoc Tran

arXiv:2008.07277·cs.LG·May 12, 2021

Adaptive Hierarchical Hyper-gradient Descent

Renlong Jie, Junbin Gao, Andrey Vasnev, Minh-Ngoc Tran

PDF

Open Access

TL;DR

This paper introduces a hierarchical hyper-gradient descent method that adaptively learns multiple levels of learning rates, improving optimization performance across various neural network architectures.

Contribution

It proposes a novel multi-level adaptive learning rate method based on hyper-gradient descent, linking regularization of over-parameterized rates with hierarchical adaptive strategies.

Findings

01

Outperforms baseline adaptive methods on multiple architectures

02

Effective across different network types including CNNs and ResNets

03

Demonstrates benefits of hierarchical learning rate adaptation

Abstract

In this study, we investigate learning rate adaption at different levels based on the hyper-gradient descent framework and propose a method that adaptively learns the optimizer parameters by combining multiple levels of learning rates with hierarchical structures. Meanwhile, we show the relationship between regularizing over-parameterized learning rates and building combinations of adaptive learning rates at different levels. The experiments on several network architectures, including feed-forward networks, LeNet-5 and ResNet-18/34, show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety of circumstances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsAdam · RMSProp