Scaling Studies for Efficient Parameter Search and Parallelism for Large   Language Model Pre-training

Michael Benington; Leo Phan; Chris Pierre Paul; Evan Shoemaker,; Priyanka Ranade; Torstein Collett; Grant Hodgson Perez; Christopher Krieger

arXiv:2310.05350·cs.DC·October 12, 2023·1 cites

Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training

Michael Benington, Leo Phan, Chris Pierre Paul, Evan Shoemaker,, Priyanka Ranade, Torstein Collett, Grant Hodgson Perez, Christopher Krieger

PDF

Open Access

TL;DR

This paper investigates how different parallelism strategies, especially Microsoft DeepSpeed ZeRO stages, affect the efficiency of training large language models with up to 13 billion parameters.

Contribution

It provides a detailed analysis of parallelism techniques for large-scale LLM pre-training, focusing on optimizing data processing and resource utilization.

Findings

01

Quantified relationships between parallelism methods.

02

Evaluated efficiency of ZeRO stages in large model training.

03

Provided insights for scalable LLM pre-training strategies.

Abstract

AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art, transformer-based model today requires use of GPU-accelerated high performance computers with high-speed interconnects. As datasets and models continue to increase in size, computational requirements and memory demands for AI also continue to grow. These challenges have inspired the development of distributed algorithm and circuit-based optimization techniques that enable the ability to progressively scale models in multi-node environments, efficiently minimize neural network cost functions for faster convergence, and store more parameters into a set number of available resources. In our research project, we focus on parallel and distributed machine learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Natural Language Processing Techniques

MethodsFocus