MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Zhengqing Yuan; Hanchi Sun; Lichao Sun; Yanfang Ye

arXiv:2604.05091·cs.CL·April 8, 2026

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye

PDF

1 Repo

TL;DR

MegaTrain is a memory-centric system that enables full-precision training of models exceeding 100 billion parameters on a single GPU by leveraging host memory and optimized streaming techniques.

Contribution

It introduces a novel memory-centric training system that overcomes GPU memory limitations for extremely large models using host memory and streaming optimizations.

Findings

01

Successfully trains 120B parameter models on a single GPU.

02

Achieves 1.84x throughput compared to DeepSpeed ZeRO-3 with CPU offloading.

03

Enables training 7B models with 512k token context on a single GH200.

Abstract

We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer, we stream parameters in and compute gradients out, minimizing persistent device state. To battle the CPU-GPU bandwidth bottleneck, we adopt two key optimizations. 1) We introduce a pipelined double-buffered execution engine that overlaps parameter prefetching, computation, and gradient offloading across multiple CUDA streams, enabling continuous GPU execution. 2) We replace persistent autograd graphs with stateless layer templates, binding weights dynamically as they stream in, eliminating persistent graph metadata while providing flexibility in scheduling. On a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dlyuangod/MegaTrain
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.