Optimizing Winograd Convolution on ARMv8 processors
Haoyuan Gui, Xiaoyu Zhang, Chong Zhang, Zitong Su, Huiyuan Li

TL;DR
This paper presents a fused Winograd Convolution algorithm optimized for ARMv8 CPUs that significantly accelerates CNN computations by integrating transformations and computation into a single pipeline with manual assembly tuning.
Contribution
It introduces a novel fused Winograd algorithm with a custom data layout and multi-dimensional parallel strategy, achieving substantial speedups over existing libraries on ARMv8 platforms.
Findings
Achieves up to 10.57x speedup over existing libraries on Kunpeng 920.
Demonstrates significant performance gains on AWS Graviton2 and Phytium 2000+ platforms.
Effectively utilizes manual assembly optimization and data layout for high efficiency.
Abstract
As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency. However, existing implementations often face challenges such as high transformation overhead, suboptimal computation efficiency, and reduced parallel performance in some layers. We propose a fused Winograd Convolution algorithm optimized for ARMv8 CPUs, integrating input transformation, filter transformation, computation, and output transformation into a single pipeline. By maintaining consecutive memory access and using a custom z-shaped data layout, our approach fully utilizes an optimized GEMM micro-kernel with a ping-pong technique. Additionally, we introduce a multi-dimensional parallel strategy that adapts to convolutional layer scales. To maximize performance, we manually optimize each kernel in AArch64 assembly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques
