TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

Wanchao Liang; Tianyu Liu; Less Wright; Will Constable; Andrew Gu; Chien-Chin Huang; Iris Zhang; Wei Feng; Howard Huang; Junjie Wang; Sanket Purandare; Gokul Nadathur; Stratos Idreos

arXiv:2410.06511·cs.CL·June 10, 2025

TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos

PDF

Open Access 3 Repos 9 Models 1 Video

TL;DR

TorchTitan is a comprehensive, open-source PyTorch system that simplifies large language model training by unifying advanced techniques, enabling efficient scaling, and providing tools for production readiness and recipe comparison.

Contribution

It introduces TorchTitan, a modular, scalable, and interoperable system that integrates state-of-the-art LLM training techniques within PyTorch, streamlining development and optimization.

Findings

01

Achieved up to 65.08% acceleration with 1D parallelism at 128-GPU scale.

02

Demonstrated additional speedups of 12.59% and 30% with 2D and 3D parallelism at larger scales.

03

Validated performance improvements on Llama 3.1 models from 8B to 405B parameters.

Abstract

The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort. This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining· slideslive

Taxonomy

TopicsMetallurgy and Material Forming · Manufacturing Process and Optimization

MethodsLLaMA