Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates
Itamar Tsayag, Ofir Lindenbaum

TL;DR
This paper introduces a fully differentiable method using relaxed Bernoulli gates to efficiently discover sparse subnetworks in neural networks, significantly reducing memory and computation costs while maintaining accuracy.
Contribution
It presents the first fully differentiable approach for Strong Lottery Ticket discovery that avoids non-differentiable estimators and iterative pruning, enabling scalable network sparsification.
Findings
Achieves up to 90% sparsity with minimal accuracy loss
Nearly doubles the sparsity compared to edge-popup at similar accuracy
Works across various architectures including CNNs and Vision Transformers
Abstract
Over-parameterized neural networks incur prohibitive memory and computational costs for resource-constrained deployment. The Strong Lottery Ticket (SLT) hypothesis suggests that randomly initialized networks contain sparse subnetworks achieving competitive accuracy without weight training. Existing SLT methods, notably edge-popup, rely on non-differentiable score-based selection, limiting optimization efficiency and scalability. We propose using continuously relaxed Bernoulli gates to discover SLTs through fully differentiable, end-to-end optimization - training only gating parameters while keeping all network weights frozen at their initialized values. Continuous relaxation enables direct gradient-based optimization of an -regularization objective, eliminating the need for non-differentiable gradient estimators or iterative pruning cycles. To our knowledge, this is the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
