dynActivation: A Trainable Activation Family for Adaptive Nonlinearity

Alois Bachmann

arXiv:2603.22154·cs.LG·March 24, 2026

dynActivation: A Trainable Activation Family for Adaptive Nonlinearity

Alois Bachmann

PDF

Open Access

TL;DR

This paper introduces dynActivation, a trainable activation function that interpolates between base nonlinearities and linearity, improving training efficiency and robustness across vision and language tasks.

Contribution

It proposes a novel trainable activation family, dynActivation, that adapts per layer to enhance deep network training and performance.

Findings

01

Improves training efficiency by up to 54% over ReLU.

02

Enhances robustness, maintaining high accuracy in very deep networks.

03

Achieves significant perplexity reduction in language modeling.

Abstract

This paper proposes $dynActivation$ , a per-layer trainable activation defined as $f_{i} (x) = BaseAct (x) (α_{i} - β_{i}) + β_{i} x$ , where $α_{i}$ and $β_{i}$ are lightweight learned scalars that interpolate between the base nonlinearity and a linear path and $BaseAct (x)$ resembles any ReLU-like function. The static and dynamic ReLU-like variants are then compared across multiple vision tasks, language modeling tasks, and ablation studies. The results suggest that dynActivation variants tend to linearize deep layers while maintaining high performance, which can improve training efficiency by up to $+ 54%$ over ReLU. On CIFAR-10, dynActivation(Mish) improves over static Mish by up to $+ 14.02%$ on AttentionCNN with an average improvment by $+ 6.00%$ , with a $24%$ convergence-AUC reduction relative to Mish (2120 vs. 2785). In a 1-to-75-layer MNIST…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis