Accelerating CNN inference on long vector architectures via co-design

Sonia Rani Gupta; Nikela Papadopoulou; Miquel Pericas

arXiv:2212.11574·cs.DC·December 23, 2022

Accelerating CNN inference on long vector architectures via co-design

Sonia Rani Gupta, Nikela Papadopoulou, Miquel Pericas

PDF

Open Access

TL;DR

This research explores co-designing vector architectures for CPU-based CNN inference, demonstrating significant performance improvements through longer vectors, larger caches, and novel parallelization strategies, especially for Winograd kernels.

Contribution

It introduces a co-design approach optimizing vector length and cache size for CNN kernels, with novel parallelization for Winograd, achieving up to 5x and 2.4x performance gains.

Findings

01

Longer vector lengths and larger caches improve CNN kernel performance by 5x.

02

Winograd kernels benefit from inter-tile parallelization, achieving 2.4x speedup.

03

Winograd requires smaller caches compared to im2col+GEMM.

Abstract

CPU-based inference can be an alternative to off-chip accelerators, and vector architectures are a promising option due to their efficiency. However, the large design space of convolutional algorithms and hardware implementations makes it challenging to select the best options. This paper presents ongoing research into co-designing vector architectures for CPU-based CNN inference, focusing on the im2col+GEMM and Winograd kernels. Using the Gem5 simulator, we examine the impact of various hardware microarchitectural features on RISC-V Vector and ARM-SVE ISAs. We also study the impact of several BLIS-like algorithmic optimizations on im2col+GEMM. Our co-design study shows that longer vector lengths and larger caches can improve performance by 5x with our optimized CNN kernels, compared to a vector length of 512-bit and 1MB of L2 cache. For Winograd, we present a novel approach of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Image and Video Retrieval Techniques