AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with   Transformers

Shikhar Tuli; Niraj K. Jha

arXiv:2302.14705·cs.AR·May 2, 2023·6 cites

AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers

Shikhar Tuli, Niraj K. Jha

PDF

Open Access 1 Repo

TL;DR

AccelTran introduces a sparsity-aware transformer accelerator with a dynamic pruning scheme, DynaTran, that enhances throughput and energy efficiency by reducing ineffectual computations during inference.

Contribution

This work presents DynaTran, a novel runtime activation pruning method, and an accelerator architecture, AccelTran, optimized for transformer models, achieving higher sparsity and efficiency.

Findings

01

DynaTran surpasses state-of-the-art pruning strategies in accuracy and sparsity.

02

AccelTran-Edge achieves 330K× throughput and 93K× lower energy than Raspberry Pi.

03

AccelTran-Server outperforms Energon with 5.73× higher throughput and 3.69× lower energy.

Abstract

Self-attention-based transformer models have achieved tremendous success in the domain of natural language processing. Despite their efficacy, accelerating the transformer is challenging due to its quadratic computational complexity and large activation sizes. Existing transformer accelerators attempt to prune its tokens to reduce memory access, albeit with high compute overheads. Moreover, previous works directly operate on large matrices involved in the attention operation, which limits hardware utilization. In order to address these challenges, this work proposes a novel dynamic inference scheme, DynaTran, which prunes activations at runtime with low overhead, substantially reducing the number of ineffectual operations. This improves the throughput of transformer inference. We further propose tiling the matrices in transformer operations along with diverse dataflows to improve data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jha-lab/acceltran
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Topic Modeling · Ferroelectric and Negative Capacitance Devices

MethodsPruning