Deep Learning Models on CPUs: A Methodology for Efficient Training

Quchen Fu; Ramesh Chukka; Keith Achorn; Thomas Atta-fosu; Deepak R.; Canchi; Zhongwei Teng; Jules White; and Douglas C. Schmidt

arXiv:2206.10034·cs.LG·June 21, 2023

Deep Learning Models on CPUs: A Methodology for Efficient Training

Quchen Fu, Ramesh Chukka, Keith Achorn, Thomas Atta-fosu, Deepak R., Canchi, Zhongwei Teng, Jules White, and Douglas C. Schmidt

PDF

Open Access

TL;DR

This paper introduces a methodology and toolkit for optimizing deep learning training on CPUs, demonstrating significant performance improvements and providing insights into CPU-based training efficiency.

Contribution

It presents a novel optimization workflow and ProfileDNN toolkit for CPU training, achieving up to 2x speedups and improved performance profiling capabilities.

Findings

01

2x training speedup for RetinaNet-ResNext50 on CPUs

02

ProfileDNN enables effective bottleneck identification

03

Custom kernel outperforms reference implementation

Abstract

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when deciding on how to choose the proper hardware for training. In particular, CPU servers can be beneficial if training on CPUs was more efficient, as they incur fewer hardware update costs and better utilizing existing infrastructure. This paper makes several contributions to research on training deep learning models using CPUs. First, it presents a method for optimizing the training of deep learning models on Intel CPUs and a toolkit called ProfileDNN, which we developed to improve performance profiling. Second, we describe a generic training optimization method that guides our workflow and explores several case studies where we identified performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices

MethodsFocal Loss