Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

Itamar Tsayag; Ofir Lindenbaum

arXiv:2603.08914·cs.LG·March 11, 2026

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

Itamar Tsayag, Ofir Lindenbaum

PDF

Open Access

TL;DR

This paper introduces a fully differentiable method using relaxed Bernoulli gates to efficiently discover sparse subnetworks in neural networks, significantly reducing memory and computation costs while maintaining accuracy.

Contribution

It presents the first fully differentiable approach for Strong Lottery Ticket discovery that avoids non-differentiable estimators and iterative pruning, enabling scalable network sparsification.

Findings

01

Achieves up to 90% sparsity with minimal accuracy loss

02

Nearly doubles the sparsity compared to edge-popup at similar accuracy

03

Works across various architectures including CNNs and Vision Transformers

Abstract

Over-parameterized neural networks incur prohibitive memory and computational costs for resource-constrained deployment. The Strong Lottery Ticket (SLT) hypothesis suggests that randomly initialized networks contain sparse subnetworks achieving competitive accuracy without weight training. Existing SLT methods, notably edge-popup, rely on non-differentiable score-based selection, limiting optimization efficiency and scalability. We propose using continuously relaxed Bernoulli gates to discover SLTs through fully differentiable, end-to-end optimization - training only gating parameters while keeping all network weights frozen at their initialized values. Continuous relaxation enables direct gradient-based optimization of an $ℓ_{0}$ -regularization objective, eliminating the need for non-differentiable gradient estimators or iterative pruning cycles. To our knowledge, this is the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning