Orthogonal-Pad\'e Activation Functions: Trainable Activation functions   for smooth and faster convergence in deep networks

Koushik Biswas; Shilpak Banerjee; Ashish Kumar Pandey

arXiv:2106.09693·cs.NE·June 18, 2021·1 cites

Orthogonal-Pad\'e Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks

Koushik Biswas, Shilpak Banerjee, Ashish Kumar Pandey

PDF

Open Access

TL;DR

This paper introduces orthogonal-Padé activation functions, which are trainable and lead to faster convergence and higher accuracy in deep neural networks across various datasets and models.

Contribution

It proposes a new class of trainable activation functions called orthogonal-Padé functions, identifying two effective variants, HP-1 and HP-2, that outperform ReLU in multiple deep learning benchmarks.

Findings

01

HP-1 and HP-2 improve accuracy by over 4% on CIFAR100 with ResNet-34.

02

They outperform ReLU by 2-5% on various models and datasets.

03

The proposed activations enable faster learning and better performance.

Abstract

We have proposed orthogonal-Pad\'e activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we have found two best candidates out of six orthogonal-Pad\'e activations, which we call safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2. When compared to ReLU, HP-1 and HP-2 has an increment in top-1 accuracy by 5.06% and 4.63% respectively in PreActResNet-34, by 3.02% and 2.75% respectively in MobileNet V2 model on CIFAR100 dataset while on CIFAR10 dataset top-1 accuracy increases by 2.02% and 1.78% respectively in PreActResNet-34, by 2.24% and 2.06% respectively in LeNet, by 2.15% and 2.03% respectively in Efficientnet B0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Blind Source Separation Techniques · Sparse and Compressive Sensing Techniques

MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Batch Normalization · Inverted Residual Block · Dropout · Sigmoid Activation · 1x1 Convolution · Convolution · *Communicated@Fast*How Do I Communicate to Expedia?