dynActivation: A Trainable Activation Family for Adaptive Nonlinearity
Alois Bachmann

TL;DR
This paper introduces dynActivation, a trainable activation function that interpolates between base nonlinearities and linearity, improving training efficiency and robustness across vision and language tasks.
Contribution
It proposes a novel trainable activation family, dynActivation, that adapts per layer to enhance deep network training and performance.
Findings
Improves training efficiency by up to 54% over ReLU.
Enhances robustness, maintaining high accuracy in very deep networks.
Achieves significant perplexity reduction in language modeling.
Abstract
This paper proposes , a per-layer trainable activation defined as , where and are lightweight learned scalars that interpolate between the base nonlinearity and a linear path and resembles any ReLU-like function. The static and dynamic ReLU-like variants are then compared across multiple vision tasks, language modeling tasks, and ablation studies. The results suggest that dynActivation variants tend to linearize deep layers while maintaining high performance, which can improve training efficiency by up to over ReLU. On CIFAR-10, dynActivation(Mish) improves over static Mish by up to on AttentionCNN with an average improvment by , with a convergence-AUC reduction relative to Mish (2120 vs. 2785). In a 1-to-75-layer MNIST…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
