GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse
Chenyang Ai, Lechuan Zhao, Zhijie Huang, Cangyuan Li, Xinan Wang, Ying, Wang

TL;DR
This paper introduces GTA, a novel tensor accelerator that improves area efficiency and data reuse, achieving significant speedups and memory efficiency over existing accelerators like VPU, GPGPU, and CGRA.
Contribution
The paper proposes a new systolic architecture-based tensor accelerator with enhanced area efficiency and data reuse, including a comprehensive hardware scheduling space.
Findings
GTA achieves up to 8.76X memory efficiency improvements.
GTA delivers up to 25.83X speedup over existing accelerators.
The architecture supports flexible dataflow, precision, and array resizing.
Abstract
Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and CGRA, support tensor operators with low energy and area efficiency. This paper conducts an in-depth exploration of general accelerator for tensor processing. First, we find the similarity between matrix multiplication and precision multiplication, and create a method classifying tensor operators. Then, we implement two discoveries and introduce the systolic architecture into general-purpose accelerator. Therefore, we propose a new General Tensor Accelerator (GTA), which has a better area efficiency and data reuse. Furthermore, we create a large hardware scheduling space consisting of dataflow, precision and array resize. Our evaluation results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Parallel Computing and Optimization Techniques · Tensor decomposition and applications
