Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism
Cong Wang, Zexin Fu, Jiayi Huang, Shanshi Huang

TL;DR
Hemlet is a scalable, heterogeneous compute-in-memory chiplet architecture that accelerates Vision Transformers efficiently by employing group-level parallelism and system-level dataflow optimizations, achieving significant speedups and high energy efficiency.
Contribution
This work introduces Hemlet, a novel chiplet-based CIM system with group-level parallelism for scalable and efficient ViT acceleration, addressing communication and scalability challenges.
Findings
Achieves 2.41x to 5.74x speedup across configurations.
Reaches 9.56 TOPS throughput.
Energy efficiency of 4.98 TOPS/W.
Abstract
Vision Transformers (ViTs) have established new performance benchmarks in vision tasks such as image recognition and object detection. However, these advancements come with significant demands for memory and computational resources, presenting challenges for hardware deployment. Heterogeneous compute-in-memory (CIM) accelerators have emerged as a promising solution for enabling energy-efficient deployment of ViTs. Despite this potential, monolithic CIM-based designs face scalability issues due to the size limitations of a single chip. To address this challenge, emerging chiplet-based techniques offer a more scalable alternative. However, chiplet designs come with their own costs, as they introduce expensive communication, which can hinder improvements in throughput. This work introduces Hemlet, a heterogeneous CIM chiplet system designed to accelerate ViT workloads. Hemlet enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · CCD and CMOS Imaging Sensors · Parallel Computing and Optimization Techniques
