ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining
C. Peltekis, D. Filippas, G. Dimitrakopoulos, C. Nicopoulos, D., Pnevmatikatos

TL;DR
ArrayFlex introduces a configurable systolic array architecture with adjustable pipelining to optimize CNN layer processing, significantly reducing latency and power consumption while maintaining high performance.
Contribution
It proposes a novel systolic array with configurable pipeline modes, enabling layer-specific optimization for improved efficiency in CNN inference.
Findings
Reduces CNN inference latency by 11% on average.
Consumes 13%-23% less power compared to fixed-pipeline designs.
Achieves 1.4x to 1.8x better energy-delay-product efficiency.
Abstract
Convolutional Neural Networks (CNNs) are the state-of-the-art solution for many deep learning applications. For maximum scalability, their computation should combine high performance and energy efficiency. In practice, the convolutions of each CNN layer are mapped to a matrix multiplication that includes all input features and kernels of each layer and is computed using a systolic array. In this work, we focus on the design of a systolic array with configurable pipeline with the goal to select an optimal pipeline configuration for each CNN layer. The proposed systolic array, called ArrayFlex, can operate in normal, or in shallow pipeline mode, thus balancing the execution time in cycles and the operating clock frequency. By selecting the appropriate pipeline configuration per CNN layer, ArrayFlex reduces the inference latency of state-of-the-art CNNs by 11%, on average, as compared to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing
