TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training
Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos

TL;DR
TorchTitan is a comprehensive, open-source PyTorch system that simplifies large language model training by unifying advanced techniques, enabling efficient scaling, and providing tools for production readiness and recipe comparison.
Contribution
It introduces TorchTitan, a modular, scalable, and interoperable system that integrates state-of-the-art LLM training techniques within PyTorch, streamlining development and optimization.
Findings
Achieved up to 65.08% acceleration with 1D parallelism at 128-GPU scale.
Demonstrated additional speedups of 12.59% and 30% with 2D and 3D parallelism at larger scales.
Validated performance improvements on Llama 3.1 models from 8B to 405B parameters.
Abstract
The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort. This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗arcee-ai/AFM-4.5B-Base-Pre-Annealmodel· 4 dl· ♡ 34 dl♡ 3
- 🤗arcee-ai/AFM-4.5B-Previewmodel· 5 dl· ♡ 55 dl♡ 5
- 🤗arcee-ai/AFM-4.5B-GGUFmodel· 279 dl· ♡ 30279 dl♡ 30
- 🤗arcee-ai/AFM-4.5B-Basemodel· 18k dl· ♡ 3318k dl♡ 33
- 🤗arcee-ai/AFM-4.5Bmodel· 1.3k dl· ♡ 951.3k dl♡ 95
- 🤗Mungert/AFM-4.5B-GGUFmodel· 62 dl62 dl
- 🤗lucyknada/arcee-ai_AFM-4.5B-Preview-8bpw-exl3model
- 🤗arcee-ai/AFM-4.5B-ovmodel· ♡ 8♡ 8
- 🤗AlekseyCalvin/LYRICAL_MT_ru2en_19_AFM45_3epochsmodel· 2 dl2 dl
Videos
Taxonomy
TopicsMetallurgy and Material Forming · Manufacturing Process and Optimization
MethodsLLaMA
