GTA: a new General Tensor Accelerator with Better Area Efficiency and   Data Reuse

Chenyang Ai; Lechuan Zhao; Zhijie Huang; Cangyuan Li; Xinan Wang; Ying; Wang

arXiv:2405.02196·cs.AR·May 6, 2024

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse

Chenyang Ai, Lechuan Zhao, Zhijie Huang, Cangyuan Li, Xinan Wang, Ying, Wang

PDF

Open Access

TL;DR

This paper introduces GTA, a novel tensor accelerator that improves area efficiency and data reuse, achieving significant speedups and memory efficiency over existing accelerators like VPU, GPGPU, and CGRA.

Contribution

The paper proposes a new systolic architecture-based tensor accelerator with enhanced area efficiency and data reuse, including a comprehensive hardware scheduling space.

Findings

01

GTA achieves up to 8.76X memory efficiency improvements.

02

GTA delivers up to 25.83X speedup over existing accelerators.

03

The architecture supports flexible dataflow, precision, and array resizing.

Abstract

Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and CGRA, support tensor operators with low energy and area efficiency. This paper conducts an in-depth exploration of general accelerator for tensor processing. First, we find the similarity between matrix multiplication and precision multiplication, and create a method classifying tensor operators. Then, we implement two discoveries and introduce the systolic architecture into general-purpose accelerator. Therefore, we propose a new General Tensor Accelerator (GTA), which has a better area efficiency and data reuse. Furthermore, we create a large hardware scheduling space consisting of dataflow, precision and array resize. Our evaluation results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Parallel Computing and Optimization Techniques · Tensor decomposition and applications