LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units
Guangli Li, Lei Liu, Xueying Wang, Xiu Ma, Xiaobing Feng

TL;DR
LANCE introduces an efficient low-precision quantized Winograd convolution method that leverages GPU acceleration, significantly boosting neural network performance with minimal accuracy loss.
Contribution
It presents a novel low-precision quantized Winograd convolution algorithm optimized for GPUs, combining quantization and fast convolution techniques for improved efficiency.
Findings
Up to 2.40x speedup over full-precision convolution.
Effective 8-bit quantization with minimal accuracy loss.
Validated on SVHN, CIFAR, and ImageNet datasets.
Abstract
Accelerating deep convolutional neural networks has become an active topic and sparked an interest in academia and industry. In this paper, we propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. By embedding linear quantization operations into the Winograd-domain, the fast convolution can be performed efficiently under low-precision computation on graphics processing units. We test neural network models with LANCE on representative image classification datasets, including SVHN, CIFAR, and ImageNet. The experimental results show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
MethodsConvolution
