DiT-HC: Enabling Efficient Training of Visual Generation Model DiT on HPC-oriented CPU Cluster
Jinxiao Zhang, Yunpu Xu, Xiyong Wu, Runmin Dong, Shenggan Cheng, Yi Zhao, Mengxuan Chen, Qinrui Zheng, Jianting Liu, Haohuan Fu

TL;DR
This paper introduces DiT-HC, a system enabling efficient training of the DiT generative model on HPC CPU clusters through novel parallelism, optimized kernels, and overlapping computation and communication, achieving significant speedups and scalability.
Contribution
DiT-HC is the first system to train and scale the DiT generative model on HPC CPU clusters, integrating new techniques for improved performance and scalability.
Findings
Achieved 8.2 to 87.7 times speedups over existing CPU libraries.
Demonstrated 90.6% weak scaling efficiency on 256 nodes.
Validated the feasibility of large-scale generative model training on CPU clusters.
Abstract
Generative foundation models have become an important tool for data reconstruction and simulation in scientific computing, showing a tight integration with traditional numerical simulations. At the same time, with the development of new hardware features, such as matrix acceleration units and high-bandwidth memory, CPU-based clusters offer promising opportunities to accelerate and scale such models, facilitating the unification of artificial intelligence and scientific computing. We present DiT-HC, the first system to train and scale the generative model DiT on a next-generation HPC CPU cluster. DiT-HC introduces three key techniques: (1) communication-free tensor parallelism (CFTP) with AutoMem for automated memory-aware dataflow, (2) HCOps, a suite of optimized GEMM and operator kernels leveraging vector and matrix acceleration units, and (3) a custom MPI backend that overlaps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis
