TL;DR
AdaAct is an adaptive optimization method that stabilizes neuron activations by adjusting learning rates based on activation variance, leading to improved generalization in image classification tasks.
Contribution
Introduces AdaAct, a new optimizer that enhances training stability and generalization by neuron-wise adaptive learning rates based on activation variance.
Findings
AdaAct achieves competitive accuracy on CIFAR and ImageNet.
It bridges the gap between Adam's convergence speed and SGD's generalization.
AdaAct maintains efficient training times.
Abstract
We introduce AdaAct, a novel optimization algorithm that adjusts learning rates according to activation variance. Our method enhances the stability of neuron outputs by incorporating neuron-wise adaptivity during the training process, which subsequently leads to better generalization -- a complementary approach to conventional activation regularization methods. Experimental results demonstrate AdaAct's competitive performance across standard image classification benchmarks. We evaluate AdaAct on CIFAR and ImageNet, comparing it with other state-of-the-art methods. Importantly, AdaAct effectively bridges the gap between the convergence speed of Adam and the strong generalization capabilities of SGD, all while maintaining competitive execution times. Code is available at https://github.com/hseung88/adaact.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent · Activation Regularization · Adam · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
