TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Architecture and Hardware Implementation
Cristian Sestito, Shady Agwa, Themis Prodromakis

TL;DR
This paper introduces TrIM, a novel triangular input movement dataflow for systolic arrays that significantly reduces memory accesses and energy consumption in CNN hardware accelerators, demonstrated on FPGA with high throughput.
Contribution
The paper proposes TrIM, a new dataflow for systolic arrays that reduces memory access by an order of magnitude and improves energy efficiency in CNN hardware implementations.
Findings
Achieves peak throughput of 453.6 Giga Operations/sec.
Reduces memory accesses by up to 3x compared to state-of-the-art.
Up to 11.9x more energy-efficient than existing FPGA accelerators.
Abstract
Modern hardware architectures for Convolutional Neural Networks (CNNs), other than targeting high performance, aim at dissipating limited energy. Reducing the data movement cost between the computing cores and the memory is a way to mitigate the energy consumption. Systolic arrays are suitable architectures to achieve this objective: they use multiple processing elements that communicate each other to maximize data utilization, based on proper dataflows like the weight stationary and row stationary. Motivated by this, we have proposed TrIM, an innovative dataflow based on a triangular movement of inputs, and capable to reduce the number of memory accesses by one order of magnitude when compared to state-of-the-art systolic arrays. In this paper, we present a TrIM-based hardware architecture for CNNs. As a showcase, the accelerator is implemented onto a Field Programmable Gate Array…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence
MethodsVGG-16
