Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
Zi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze

TL;DR
This paper introduces a speculative tiling method called overbooking for sparse tensor algebra accelerators, significantly improving buffer utilization and performance by allowing controlled buffer overflows with a hardware mechanism, demonstrated across diverse workloads.
Contribution
It proposes a novel overbooking approach with a hardware mechanism, Tailors, and a statistical tile size selection method, Swiftiles, to enhance buffer utilization and performance in sparse tensor accelerators.
Findings
Average speedup of 52.7x over ExTensor without tiling.
Average energy reduction of 22.5x over ExTensor.
Effective buffer utilization across 22 workloads.
Abstract
Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data reuse and improve throughput, but typically allocate tile size in a given buffer for the worst-case data occupancy. This severely limits the utilization of available memory resources and reduces data reuse. Other accelerators employ complex tiling during preprocessing or at runtime to determine the exact tile size based on its occupancy. This paper proposes a speculative tensor tiling approach, called overbooking, to improve buffer utilization by taking advantage of the distribution of nonzero elements in sparse tensors to construct larger tiles with greater data reuse. To ensure correctness, we propose a low-overhead hardware mechanism, Tailors, that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
