Systolic Arrays and Structured Pruning Co-design for Efficient Transformers in Edge Systems
Pedro Palacios, Rafael Medina, Jean-Luc Rouas, Giovanni Ansaloni, David Atienza

TL;DR
This paper presents a co-design framework combining structured pruning and systolic array acceleration to optimize transformer deployment on edge devices, achieving significant speedups with minimal quality loss.
Contribution
It introduces a novel cross-stack co-design approach that jointly optimizes pruning and hardware configuration for efficient transformer inference at the edge.
Findings
Up to 44% speedup in system performance.
Only 1.4% word error rate degradation on LibriSpeech.
Effective trade-offs between sparsity and systolic array size are demonstrated.
Abstract
Efficient deployment of resource-intensive transformers on edge devices necessitates cross-stack optimization. We thus study the interrelation between structured pruning and systolic acceleration, matching the size of pruned blocks with the systolic array dimensions. In this setting, computations of pruned weight blocks can be skipped, reducing run-time and energy consumption, but potentially impacting quality of service (QoS). To evaluate the trade-offs between systolic array size and sparsity opportunities, we present a novel co-design framework that integrates algorithmic optimization, system simulation, and hardware design. Targeting speech recognition and machine translation using transformers as case study, we analyze how configuration choices across the stack affect performance metrics. Results demonstrate that structured pruning on systems featuring systolic array acceleration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Antenna and Metasurface Technologies · Antenna Design and Optimization · Microwave Engineering and Waveguides
Methodstravel james · Pruning
