Orthogonal-Pad\'e Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks
Koushik Biswas, Shilpak Banerjee, Ashish Kumar Pandey

TL;DR
This paper introduces orthogonal-Padé activation functions, which are trainable and lead to faster convergence and higher accuracy in deep neural networks across various datasets and models.
Contribution
It proposes a new class of trainable activation functions called orthogonal-Padé functions, identifying two effective variants, HP-1 and HP-2, that outperform ReLU in multiple deep learning benchmarks.
Findings
HP-1 and HP-2 improve accuracy by over 4% on CIFAR100 with ResNet-34.
They outperform ReLU by 2-5% on various models and datasets.
The proposed activations enable faster learning and better performance.
Abstract
We have proposed orthogonal-Pad\'e activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we have found two best candidates out of six orthogonal-Pad\'e activations, which we call safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2. When compared to ReLU, HP-1 and HP-2 has an increment in top-1 accuracy by 5.06% and 4.63% respectively in PreActResNet-34, by 3.02% and 2.75% respectively in MobileNet V2 model on CIFAR100 dataset while on CIFAR10 dataset top-1 accuracy increases by 2.02% and 1.78% respectively in PreActResNet-34, by 2.24% and 2.06% respectively in LeNet, by 2.15% and 2.03% respectively in Efficientnet B0.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Blind Source Separation Techniques · Sparse and Compressive Sensing Techniques
MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Batch Normalization · Inverted Residual Block · Dropout · Sigmoid Activation · 1x1 Convolution · Convolution · *Communicated@Fast*How Do I Communicate to Expedia?
