TL;DR
Swish-T introduces a Tanh bias to the Swish activation, creating variants that improve neural network performance across diverse tasks and datasets, with empirical validation and publicly available code.
Contribution
The paper proposes Swish-T, a novel activation function family with Tanh bias, demonstrating improved performance and flexibility over the original Swish function.
Findings
Swish-T variants outperform Swish on multiple benchmarks.
Swish-T$_{C}$ achieves high performance even without parameter tuning.
Empirical results on datasets like MNIST, CIFAR-10, and CIFAR-100 validate effectiveness.
Abstract
We propose the Swish-T family, an enhancement of the existing non-monotonic activation function Swish. Swish-T is defined by adding a Tanh bias to the original Swish function. This modification creates a family of Swish-T variants, each designed to excel in different tasks, showcasing specific advantages depending on the application context. The Tanh bias allows for broader acceptance of negative values during initial training stages, offering a smoother non-monotonic curve than the original Swish. We ultimately propose the Swish-T function, while Swish-T and Swish-T, byproducts of Swish-T, also demonstrate satisfactory performance. Furthermore, our ablation study shows that using Swish-T as a non-parametric function can still achieve high performance. The superiority of the Swish-T family has been empirically demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
