ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications
Suvinava Basak

TL;DR
ZC-Swish is a new activation function designed to stabilize deep BN-free neural networks, especially useful in micro-batch and federated learning scenarios, by dynamically anchoring activation means near zero.
Contribution
The paper introduces ZC-Swish, a parameterized activation function that maintains stable activation dynamics in deep BN-free networks, outperforming standard Swish in stability and accuracy.
Findings
ZC-Swish maintains stable activations at depths 8, 16, and 32.
Standard Swish collapses to near-random performance at depth 16 and beyond.
ZC-Swish achieves 51.5% test accuracy at depth 16 with seed 42.
Abstract
Batch Normalization (BN) is a cornerstone of deep learning, yet it fundamentally breaks down in micro-batch regimes (e.g., 3D medical imaging) and non-IID Federated Learning. Removing BN from deep architectures, however, often leads to catastrophic training failures such as vanishing gradients and dying channels. We identify that standard activation functions, like Swish and ReLU, exacerbate this instability in BN-free networks due to their non-zero-centered nature, which causes compounding activation mean-shifts as network depth increases. In this technical communication, we propose Zero-Centered Swish (ZC-Swish), a drop-in activation function parameterized to dynamically anchor activation means near zero. Through targeted stress-testing on BN-free convolutional networks at depths 8, 16, and 32, we demonstrate that while standard Swish collapses to near-random performance at depth 16…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
