SERF: Towards better training of deep neural networks using log-Softplus   ERror activation Function

Sayan Nag; Mayukh Bhattacharyya

arXiv:2108.09598·cs.LG·August 26, 2021·5 cites

SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function

Sayan Nag, Mayukh Bhattacharyya

PDF

Open Access 1 Video

TL;DR

This paper introduces Serf, a novel self-regularized, nonmonotonic activation function inspired by Swish, which significantly improves training and performance of deep neural networks across various tasks and architectures.

Contribution

The paper proposes Serf, a new activation function that outperforms ReLU, Swish, and Mish, demonstrating its effectiveness and compatibility across diverse deep learning scenarios.

Findings

01

Serf outperforms ReLU, Swish, and Mish in multiple tasks.

02

Serf provides better performance on deeper architectures.

03

Mathematical analysis shows Serf's regularization effect enhances training.

Abstract

Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and nonmonotonic in nature. Like Mish, Serf also belongs to the Swish family of functions. Based on several experiments on computer vision (image classification and object detection) and natural language processing (machine translation, sentiment classification and multimodal entailment) tasks with different state-of-the-art architectures, it is observed that Serf vastly outperforms ReLU (baseline) and other activation functions including both Swish and Mish, with a markedly bigger margin on deeper architectures. Ablation studies further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function· youtube

Taxonomy

TopicsMachine Learning in Materials Science · Machine Learning and ELM · Neural Networks and Applications

MethodsSerf · Tanh Activation · Sigmoid Activation · Dropout