maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs
Andrew Lavin

TL;DR
maxDNN is a highly efficient convolution kernel optimized for NVIDIA Maxwell GPUs, achieving over 96% computational efficiency in deep learning tasks by combining advanced assembly coding with existing convolution techniques.
Contribution
The paper introduces maxDNN, a novel convolution kernel that significantly improves computational efficiency for deep learning on Maxwell GPUs, leveraging assembly-level optimizations.
Findings
Achieves 96.3% efficiency on typical architectures
Combines ideas from cuda-convnet2 and Maxas SGEMM assembly
Focuses on forward propagation, with potential for backward propagation
Abstract
This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but we believe that the same techniques used here will be effective for backward propagation (BPROP) as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsConvolution
