Joint Training Across Multiple Activation Sparsity Regimes

Haotian Wang

arXiv:2603.03131·cs.LG·March 4, 2026

Joint Training Across Multiple Activation Sparsity Regimes

Haotian Wang

PDF

Open Access

TL;DR

This paper proposes a training strategy that cycles a neural network through various activation sparsity levels, leading to improved generalization by maintaining effective representations across dense and sparse regimes.

Contribution

It introduces a novel joint training method across multiple activation sparsity regimes using global top-k constraints and cyclic training, which outperforms dense baseline methods.

Findings

01

Adaptive keep-ratio control strategies outperform dense training.

02

Joint training across sparsity regimes improves generalization.

03

Preliminary results on CIFAR-10 show promising gains.

Abstract

Generalization in deep neural networks remains only partially understood. Inspired by the stronger generalization tendency of biological systems, we explore the hypothesis that robust internal representations should remain effective across both dense and sparse activation regimes. To test this idea, we introduce a simple training strategy that applies global top-k constraints to hidden activations and repeatedly cycles a single model through multiple activation budgets via progressive compression and periodic reset. Using CIFAR-10 without data augmentation and a WRN-28-4 backbone, we find in single-run experiments that two adaptive keep-ratio control strategies both outperform dense baseline training. These preliminary results suggest that joint training across multiple activation sparsity regimes may provide a simple and effective route to improved generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques