RAF: Holistic Compilation for Deep Learning Model Training
Cody Hao Yu, Haozheng Fan, Guangtai Huang, Zhen Jia, Yizhi Liu, Jie, Wang, Zach Zheng, Yuan Zhou, Haichen Shen, Junru Shao, Mu Li, Yida Wang

TL;DR
RAF is a comprehensive deep learning compiler that optimizes training workflows by generating training graphs, consolidating graph optimizations, and integrating kernel implementations to improve throughput and memory efficiency.
Contribution
RAF introduces a holistic compilation approach for training deep learning models, including automatic differentiation and mixed precision, which existing compilers lack.
Findings
Achieves better training throughput than PyTorch, XLA, and DeepSpeed.
Enables larger batch sizes with optimized memory usage.
Demonstrates improved performance on transformer models.
Abstract
As deep learning is pervasive in modern applications, many deep learning frameworks are presented for deep learning practitioners to develop and train DNN models rapidly. Meanwhile, as training large deep learning models becomes a trend in recent years, the training throughput and memory footprint are getting crucial. Accordingly, optimizing training workloads with compiler optimizations is inevitable and getting more and more attentions. However, existing deep learning compilers (DLCs) mainly target inference and do not incorporate holistic optimizations, such as automatic differentiation and automatic mixed precision, in training workloads. In this paper, we present RAF, a deep learning compiler for training. Unlike existing DLCs, RAF accepts a forward model and in-house generates a training graph. Accordingly, RAF is able to systematically consolidate graph optimizations for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Topic Modeling
