Systolic Array Data Flows for Efficient Matrix Multiplication in Deep   Neural Networks

Tejas Raja

arXiv:2410.22595·cs.AR·October 31, 2024

Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural Networks

Tejas Raja

PDF

Open Access

TL;DR

This paper analyzes different systolic array data flows for matrix multiplication in deep neural networks, demonstrating how choosing the optimal data flow can significantly reduce energy consumption in AI hardware accelerators.

Contribution

It provides a comparative analysis of three main systolic array data flows and their energy efficiency across various matrix sizes using simulation.

Findings

01

Weight Stationary often reduces energy for large matrices

02

Input Stationary is more efficient for smaller matrices

03

Selecting the right data flow optimizes energy use in DNN accelerators

Abstract

The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficient matrix multiplication is more critical than ever. In this paper, the three main systolic array data flows: Weight Stationary (WS), Input Stationary (IS), and Output Stationary (OS) are discussed. Each data flow's energy consumption and efficiency across various matrix sizes are calculated using the SCALE-Sim simulator. The results show that selecting the right data flow for specific matrix configurations can drastically reduce energy consumption. The conclusions provide helpful insights into optimizing hardware for AI and machine learning applications, offering potential improvements in designing energy-efficient DNN accelerators.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Parallel Computing and Optimization Techniques · Tensor decomposition and applications