TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling
Cristian Sestito, Shady Agwa, Themis Prodromakis

TL;DR
This paper introduces TrIM, a novel dataflow for systolic arrays in CNNs that maximizes local input use, reduces data movement, and significantly improves energy efficiency and throughput compared to existing methods.
Contribution
TrIM is a new dataflow for systolic arrays that reduces data redundancy, minimizes memory access, and enhances performance without increasing on-chip memory requirements.
Findings
~10X less memory access compared to state-of-the-art
Up to 81.8% higher throughput than row stationary
Requires up to 15.6X fewer registers
Abstract
In order to follow the ever-growing computational complexity and data intensity of state-of-the-art AI models, new computing paradigms are being proposed. These paradigms aim at achieving high energy efficiency by mitigating the Von Neumann bottleneck that relates to the energy cost of moving data between the processing cores and the memory. Convolutional Neural Networks (CNNs) are susceptible to this bottleneck, given the massive data they have to manage. Systolic arrays (SAs) are promising architectures to mitigate data transmission cost, thanks to high data utilization of Processing Elements (PEs). These PEs continuously exchange and process data locally based on specific dataflows (such as weight stationary and row stationary), in turn reducing the number of memory accesses to the main memory. In SAs, convolutions are managed either as matrix multiplications or exploiting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Neural Networks and Applications · Robotics and Automated Systems
