Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj, Kalamkar, Greg Henry, Hans Pabst, Alexander Heinecke

TL;DR
This paper presents high-performance, JIT-optimized direct convolution kernels for x86 CPUs, achieving near-peak efficiency and high throughput in deep learning tasks, especially on multi-node CPU systems.
Contribution
It introduces a novel JIT-based direct convolution implementation for x86 architectures, demonstrating high efficiency and integration into multi-node execution models.
Findings
Near-theoretical peak performance on CPUs
High image-throughput in deep learning tasks
Effective multi-node CPU execution
Abstract
Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, and direct convolution primarily targeting GPUs. In this paper, we introduce direct convolution kernels for x86 architectures, in particular for Xeon and XeonPhi systems, which are implemented via a dynamic compilation approach. Our JIT-based implementation shows close to theoretical peak performance, depending on the setting and the CPU architecture at hand. We additionally demonstrate how these JIT-optimized kernels can be integrated into a lightweight multi-node graph execution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Brain Tumor Detection and Classification
