SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function
Sayan Nag, Mayukh Bhattacharyya

TL;DR
This paper introduces Serf, a novel self-regularized, nonmonotonic activation function inspired by Swish, which significantly improves training and performance of deep neural networks across various tasks and architectures.
Contribution
The paper proposes Serf, a new activation function that outperforms ReLU, Swish, and Mish, demonstrating its effectiveness and compatibility across diverse deep learning scenarios.
Findings
Serf outperforms ReLU, Swish, and Mish in multiple tasks.
Serf provides better performance on deeper architectures.
Mathematical analysis shows Serf's regularization effect enhances training.
Abstract
Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and nonmonotonic in nature. Like Mish, Serf also belongs to the Swish family of functions. Based on several experiments on computer vision (image classification and object detection) and natural language processing (machine translation, sentiment classification and multimodal entailment) tasks with different state-of-the-art architectures, it is observed that Serf vastly outperforms ReLU (baseline) and other activation functions including both Swish and Mish, with a markedly bigger margin on deeper architectures. Ablation studies further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function· youtube
Taxonomy
TopicsMachine Learning in Materials Science · Machine Learning and ELM · Neural Networks and Applications
MethodsSerf · Tanh Activation · Sigmoid Activation · Dropout
