Improving compute efficacy frontiers with SliceOut
Pascal Notin, Aidan N. Gomez, Joanna Yoo, Yarin Gal

TL;DR
SliceOut is a novel dropout-inspired method that accelerates deep learning training by exploiting GPU memory layout, achieving significant speedups and memory savings without compromising accuracy.
Contribution
This paper introduces SliceOut, a new technique leveraging GPU memory layout to improve training speed and efficiency in deep learning models.
Findings
Achieves 10-40% speedups in training time.
Reduces memory usage during training.
Maintains comparable test accuracy.
Abstract
Pushing forward the compute efficacy frontier in deep learning is critical for tasks that require frequent model re-training or workloads that entail training a large number of models. We introduce SliceOut -- a dropout-inspired scheme designed to take advantage of GPU memory layout to train deep learning models faster without impacting final test accuracy. By dropping contiguous sets of units at random, our method realises training speedups through (1) fast memory access and matrix multiplication of smaller tensors, and (2) memory savings by avoiding allocating memory to zero units in weight gradients and activations. At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy. We demonstrate 10-40% speedups and memory reduction with Wide ResNets, EfficientNets, and Transformer models, with minimal to no loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Multi-Head Attention · Softmax · Label Smoothing · Dropout · Byte Pair Encoding · Adam
