Navigating Extremes: Dynamic Sparsity in Large Output Spaces

Nasib Ullah; Erik Schultheis; Mike Lasby; Yani Ioannou and; Rohit Babbar

arXiv:2411.03171·cs.LG·February 11, 2025

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

Nasib Ullah, Erik Schultheis, Mike Lasby, Yani Ioannou and, Rohit Babbar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores the use of dynamic sparse training (DST) for large output classification tasks, demonstrating how to maintain efficiency and performance with millions of labels on standard hardware.

Contribution

It introduces a method to effectively apply DST to large output spaces by addressing gradient flow issues, enabling end-to-end training with massive label sets.

Findings

01

DST can be applied to large classification tasks with millions of labels.

02

Using an intermediate layer or auxiliary objectives improves performance.

03

The approach enables training on commodity hardware with large label spaces.

Abstract

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classification with large output spaces, where memory-efficiency is paramount. With a label space of possibly millions of candidates, the classification layer alone will consume several gigabytes of memory. Switching from a dense to a fixed fan-in sparse layer updated with sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xmc-aalto/NeurIPS24-dst
pytorchOfficial

Videos

Navigating Extremes: Dynamic Sparsity in Large Output Spaces· slideslive

Taxonomy

TopicsComputational Physics and Python Applications

MethodsPruning · Dynamic Sparse Training