An Adaptive Method Stabilizing Activations for Enhanced Generalization

Hyunseok Seung; Jaewoo Lee; Hyunsuk Ko

arXiv:2506.08353·cs.LG·June 11, 2025

An Adaptive Method Stabilizing Activations for Enhanced Generalization

Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

PDF

1 Repo

TL;DR

AdaAct is an adaptive optimization method that stabilizes neuron activations by adjusting learning rates based on activation variance, leading to improved generalization in image classification tasks.

Contribution

Introduces AdaAct, a new optimizer that enhances training stability and generalization by neuron-wise adaptive learning rates based on activation variance.

Findings

01

AdaAct achieves competitive accuracy on CIFAR and ImageNet.

02

It bridges the gap between Adam's convergence speed and SGD's generalization.

03

AdaAct maintains efficient training times.

Abstract

We introduce AdaAct, a novel optimization algorithm that adjusts learning rates according to activation variance. Our method enhances the stability of neuron outputs by incorporating neuron-wise adaptivity during the training process, which subsequently leads to better generalization -- a complementary approach to conventional activation regularization methods. Experimental results demonstrate AdaAct's competitive performance across standard image classification benchmarks. We evaluate AdaAct on CIFAR and ImageNet, comparing it with other state-of-the-art methods. Importantly, AdaAct effectively bridges the gap between the convergence speed of Adam and the strong generalization capabilities of SGD, all while maintaining competitive execution times. Code is available at https://github.com/hseung88/adaact.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hseung88/adaact
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent · Activation Regularization · Adam · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings