Efficient Sparse Training with Structured Dropout
Andy Lo

TL;DR
This paper introduces SparseDrop, a structured dropout method that creates hardware-friendly sparsity, enabling faster training on GPUs while maintaining regularisation effectiveness comparable to standard dropout.
Contribution
The paper presents SparseDrop, a novel structured dropout technique with a CUDA implementation that achieves speed-ups on GPUs and retains regularisation benefits.
Findings
SparseDrop achieves GPU speed-ups at low sparsity levels.
SparseDrop maintains similar regularisation effectiveness as standard dropout.
The source code is publicly available for reproducibility.
Abstract
Dropout is a common regularisation technique in deep learning that improves generalisation. Even though it introduces sparsity and thus potential for higher throughput, it usually cannot bring speed-ups on GPUs due to its unstructured nature. In this project, I experiment with SparseDrop, a structured, hardware-friendly variant of dropout that can exploit such sparsity. I provide a CUDA implementation of SparseDrop, achieving speed-ups against its dense counterpart even at low sparsity levels. The empirical results demonstrate that SparseDrop provides similar, or sometimes even better, regularisation properties as standard dropout. This suggests its potential as a drop-in replacement to standard dropout with faster training speeds. The source code is available at https://github.com/andylolu2/sparse-dropout
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsDropout
