Adaptive Hierarchical Hyper-gradient Descent
Renlong Jie, Junbin Gao, Andrey Vasnev, Minh-Ngoc Tran

TL;DR
This paper introduces a hierarchical hyper-gradient descent method that adaptively learns multiple levels of learning rates, improving optimization performance across various neural network architectures.
Contribution
It proposes a novel multi-level adaptive learning rate method based on hyper-gradient descent, linking regularization of over-parameterized rates with hierarchical adaptive strategies.
Findings
Outperforms baseline adaptive methods on multiple architectures
Effective across different network types including CNNs and ResNets
Demonstrates benefits of hierarchical learning rate adaptation
Abstract
In this study, we investigate learning rate adaption at different levels based on the hyper-gradient descent framework and propose a method that adaptively learns the optimizer parameters by combining multiple levels of learning rates with hierarchical structures. Meanwhile, we show the relationship between regularizing over-parameterized learning rates and building combinations of adaptive learning rates at different levels. The experiments on several network architectures, including feed-forward networks, LeNet-5 and ResNet-18/34, show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety of circumstances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsAdam · RMSProp
