DiP: A Scalable, Energy-Efficient Systolic Array for Matrix Multiplication Acceleration
Ahmed J. Abdelmaksoud, Shady Agwa, Themis Prodromakis

TL;DR
The paper introduces DiP, a scalable and energy-efficient systolic array architecture for matrix multiplication that eliminates FIFO buffers, improves throughput, and enhances energy efficiency, especially for transformer workloads.
Contribution
It proposes a novel DiP dataflow architecture that removes synchronization FIFOs, leading to significant energy savings and throughput improvements over existing systolic array designs.
Findings
Up to 50% throughput improvement over conventional architectures.
Achieves 2.02x better energy efficiency per area in hardware simulations.
Outperforms TPU-like architectures on transformer workloads with up to 1.81x energy savings.
Abstract
Transformers are gaining increasing attention across Natural Language Processing (NLP) application domains due to their outstanding accuracy. However, these data-intensive models add significant performance demands to the existing computing architectures. Systolic array architectures, adopted by commercial AI computing platforms like Google TPUs, offer energy-efficient data reuse but face throughput and energy penalties due to input-output synchronization via First-In-First-Out (FIFO) buffers. This paper proposes a novel scalable systolic array architecture featuring Diagonal-Input and Permutated weight stationary (DiP) dataflow for matrix multiplication acceleration. The proposed architecture eliminates the synchronization FIFOs required by state-of-the-art weight stationary systolic arrays. Beyond the area, power, and energy savings achieved by eliminating these FIFOs, DiP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques
