The Quest of Finding the Antidote to Sparse Double Descent
Victor Qu\'etu, Marta Milovanovi\'c

TL;DR
This paper investigates the sparse double descent phenomenon in deep learning models, proposing regularization techniques including knowledge distillation to avoid performance deterioration due to sparsity.
Contribution
It introduces a novel learning scheme with knowledge distillation to effectively mitigate sparse double descent in deep models.
Findings
L2 regularization can reduce sparse double descent but affects sparsity-performance trade-off.
Knowledge distillation effectively prevents sparse double descent without sacrificing sparsity.
Experimental results confirm the proposed method's effectiveness in image classification tasks.
Abstract
In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity increases, the performance first worsens, then improves, and finally deteriorates. Such a non-monotonic behavior raises serious questions about the optimal model's size to maintain high performance: the model needs to be sufficiently over-parametrized, but having too many parameters wastes training resources. In this paper, we aim to find the best trade-off efficiently. More precisely, we tackle the occurrence of the sparse double descent and present some solutions to avoid it. Firstly, we show that a simple regularization method can help to mitigate this phenomenon but sacrifices the performance/sparsity compromise. To overcome this problem,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Machine Learning and Algorithms
