Improving compute efficacy frontiers with SliceOut

Pascal Notin; Aidan N. Gomez; Joanna Yoo; Yarin Gal

arXiv:2007.10909·cs.LG·April 2, 2021·1 cites

Improving compute efficacy frontiers with SliceOut

Pascal Notin, Aidan N. Gomez, Joanna Yoo, Yarin Gal

PDF

Open Access

TL;DR

SliceOut is a novel dropout-inspired method that accelerates deep learning training by exploiting GPU memory layout, achieving significant speedups and memory savings without compromising accuracy.

Contribution

This paper introduces SliceOut, a new technique leveraging GPU memory layout to improve training speed and efficiency in deep learning models.

Findings

01

Achieves 10-40% speedups in training time.

02

Reduces memory usage during training.

03

Maintains comparable test accuracy.

Abstract

Pushing forward the compute efficacy frontier in deep learning is critical for tasks that require frequent model re-training or workloads that entail training a large number of models. We introduce SliceOut -- a dropout-inspired scheme designed to take advantage of GPU memory layout to train deep learning models faster without impacting final test accuracy. By dropping contiguous sets of units at random, our method realises training speedups through (1) fast memory access and matrix multiplication of smaller tensors, and (2) memory savings by avoiding allocating memory to zero units in weight gradients and activations. At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy. We demonstrate 10-40% speedups and memory reduction with Wide ResNets, EfficientNets, and Transformer models, with minimal to no loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Multi-Head Attention · Softmax · Label Smoothing · Dropout · Byte Pair Encoding · Adam