FCDP: Fully Cached Data Parallel for Communication-Avoiding Large-Scale Training
Gyeongseo Park, Eungyeong Lee, Song-woo Sok, Myung-Hoon Cha, Kwangwon Koh, Baik-Song An, Hongyeon Kim, Ki-Dong Kang

TL;DR
FCDP introduces a caching strategy that leverages host memory as a fast cache to significantly reduce inter-node communication in large-scale GPU training, enabling higher throughput on commodity hardware.
Contribution
FCDP presents a novel communication-avoiding method that caches parameters in host memory, reducing inter-node communication by up to 99% while maintaining minimal GPU memory usage.
Findings
Achieves up to 100x higher throughput than ZeRO-3.
Reduces inter-node all-gather by 50%.
Maintains maximum batch size of ZeRO-3 on commodity clusters.
Abstract
Training billion-parameter models requires distributing model states across GPUs using fully sharded data parallel (i.e., ZeRO-3). While ZeRO-3 succeeds on clusters with high-bandwidth NVLink and InfiniBand interconnects, researchers with commodity hardware face severe inter-node all-gather bottlenecks. Existing optimizations take two approaches: GPU memory caching (MiCS, ZeRO++) trades memory capacity for reduced communication, triggering out-of-memory failures on large models; host memory offloading (ZeRO-Offload, ZeRO-Infinity) extends capacity but degrades throughput due to PCIe overhead. We observe that on bandwidth-limited clusters, host memory can serve not as an overflow tier but as a fast caching layer that outperforms inter-node communication. Based on this insight, we propose FCDP, which eliminates redundant inter-node communication while preserving ZeRO-3's minimal GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
