Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer   Capacity

Zi Yu Xue; Yannan Nellie Wu; Joel S. Emer; Vivienne Sze

arXiv:2310.00192·cs.AR·June 27, 2024

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

Zi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze

PDF

TL;DR

This paper introduces a speculative tiling method called overbooking for sparse tensor algebra accelerators, significantly improving buffer utilization and performance by allowing controlled buffer overflows with a hardware mechanism, demonstrated across diverse workloads.

Contribution

It proposes a novel overbooking approach with a hardware mechanism, Tailors, and a statistical tile size selection method, Swiftiles, to enhance buffer utilization and performance in sparse tensor accelerators.

Findings

01

Average speedup of 52.7x over ExTensor without tiling.

02

Average energy reduction of 22.5x over ExTensor.

03

Effective buffer utilization across 22 workloads.

Abstract

Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data reuse and improve throughput, but typically allocate tile size in a given buffer for the worst-case data occupancy. This severely limits the utilization of available memory resources and reduces data reuse. Other accelerators employ complex tiling during preprocessing or at runtime to determine the exact tile size based on its occupancy. This paper proposes a speculative tensor tiling approach, called overbooking, to improve buffer utilization by taking advantage of the distribution of nonzero elements in sparse tensors to construct larger tiles with greater data reuse. To ensure correctness, we propose a low-overhead hardware mechanism, Tailors, that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.