Performance of Small Language Model Pretraining on FABRIC: An Empirical Study

Praveen Rao

arXiv:2602.02632·cs.LG·March 23, 2026

Performance of Small Language Model Pretraining on FABRIC: An Empirical Study

Praveen Rao

PDF

Open Access

TL;DR

This study evaluates pretraining techniques for small language models on commodity GPU clusters, analyzing parallelism strategies and network effects to optimize training performance and resource usage.

Contribution

It provides a systematic approach for selecting pretraining methods for small LLMs considering hardware and network constraints, based on extensive empirical testing.

Findings

01

Alpa's execution plans outperform others in distributed settings.

02

Network latency significantly impacts pretraining efficiency.

03

Optimized parallelism strategies reduce training time and resource consumption.

Abstract

Large language models (LLMs) require enormous computing power to pretrain on massive datasets. When limited datasets are available, smaller-sized LLMs are better choice to pretrain (on user-specified datasets) by following the scaling laws of LLMs. Using pretrained models, vector embeddings can be generated for raw data and stored using vector databases to support modern AI applications and semantic search. In this work, we investigate the performance of pretraining techniques for smaller-sized LLMs on an experimental testbed (with commodity GPUs) available to academic users at no charge. We consider data parallelism, intra-operator parallelism, and inter-operator/pipeline parallelism, and their combinations for pretraining. We set up different GPU clusters with homogeneous and heterogeneous GPU hardware. Furthermore, we investigate the impact of network latency on pretraining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Big Data and Digital Economy · Topic Modeling