Towards a Better Theoretical Understanding of Independent Subnetwork   Training

Egor Shulgin; Peter Richt\'arik

arXiv:2306.16484·cs.LG·June 5, 2024·1 cites

Towards a Better Theoretical Understanding of Independent Subnetwork Training

Egor Shulgin, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of Independent Subnetwork Training (IST), a technique for scalable neural network training, highlighting its differences from other distributed methods and analyzing its optimization performance on quadratic models.

Contribution

It offers a detailed theoretical understanding of IST, distinguishing it from other distributed training approaches and analyzing its effectiveness on quadratic models.

Findings

01

IST has fundamental differences from compressed communication methods.

02

Theoretical analysis shows IST's optimization performance on quadratic models.

03

Highlights advantages of IST in large-scale neural network training.

Abstract

Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Neural Network Applications · Machine Learning and ELM