High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures
Xiang Fu, Xinpeng Zhang, Jixiang Ma, Peng Zhao, Shuai Lu, Xu T. Liu

TL;DR
This paper introduces three new tensor data layouts and optimization techniques for convolution operations on SIMD architectures, significantly improving performance and approaching theoretical hardware limits.
Contribution
It proposes three novel data layouts for im2win convolution and general optimization techniques, enhancing performance on SIMD architectures.
Findings
im2win convolution with NHWC layout achieves 355% speedup over NCHW
Optimizations improve im2win and direct convolution performance substantially
Achieved up to 95% and 94% of theoretical peak performance for im2win and direct convolutions
Abstract
Convolution is the core component within deep neural networks and it is computationally intensive and time consuming. Tensor data layouts significantly impact convolution operations in terms of memory access and computational efficiency. Yet, there is still a lack of comprehensive performance characterization on data layouts on SIMD architectures concerning convolution methods. This paper proposes three novel data layouts for im2win convolution: NHWC, CHWN, and CHWN8, and introduces a set of general optimization techniques for both direct and im2win convolutions. We compare the optimized im2win convolution with the direct convolution and PyTorch's im2col-based convolution across the aforementioned layouts on SIMD machines. The experiments demonstrated that the im2win convolution with the new NHWC layout achieved up to 355% performance speedup over NCHW layout. Our optimizations also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Algorithms and Data Compression · Wireless Communication Networks Research
MethodsSparse Evolutionary Training · Convolution
