Top-KAST: Top-K Always Sparse Training

Siddhant M. Jayakumar; Razvan Pascanu; Jack W. Rae; Simon Osindero,; Erich Elsen

arXiv:2106.03517·cs.LG·June 8, 2021·21 cites

Top-KAST: Top-K Always Sparse Training

Siddhant M. Jayakumar, Razvan Pascanu, Jack W. Rae, Simon Osindero,, Erich Elsen

PDF

Open Access 2 Repos 1 Video

TL;DR

Top-KAST introduces a sparse training method that maintains constant sparsity throughout training, enabling efficient large-scale model training with comparable or better performance on benchmarks like ImageNet and language modeling tasks.

Contribution

It presents a simple, effective approach for sparse training that avoids dense computations, facilitating scalable and resource-efficient training of large models.

Findings

01

Performs comparably or better than previous methods on ImageNet

02

Enables training of large language models with fewer resources

03

Easy to implement in existing frameworks

Abstract

Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most existing methods for inducing performant sparse models still entail the instantiation of dense parameters, or dense gradients in the backward-pass, during training. For very large models this requirement can be prohibitive. In this work we propose Top-KAST, a method that preserves constant sparsity throughout training (in both the forward and backward-passes). We demonstrate the efficacy of our approach by showing that it performs comparably to or better than previous works when training models on the established ImageNet benchmark, whilst fully maintaining sparsity. In addition to our ImageNet results, we also demonstrate our approach in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Top-KAST: Top-K Always Sparse Training· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications