A 28nm 0.22{\mu}J/token memory-compute-intensity-aware CNN-Transformer accelerator with hybrid-attention-based layer-fusion and cascaded pruning for semantic-segmentation
Pingcheng Dong, Yonghao Tan, Xuejiao Liu, Peng Luo, Yu Liu, Luhong Liang, Yitong Zhou, Di Pang, Man-To Yung, Dong Zhang, Xijie Huang, Shih-Yang Liu, Yongkun Wu, Fengshi Tian, Chi-Ying Tsui, Fengbin Tu, Kwang-Ting Cheng

TL;DR
This paper introduces a 28nm CNN-Transformer accelerator optimized for semantic segmentation, combining hybrid attention, layer fusion, and cascaded pruning to significantly improve energy efficiency and reduce power consumption.
Contribution
It presents a novel 28nm accelerator with hybrid attention, layer-fusion, and cascaded pruning, achieving substantial energy savings for semantic segmentation tasks.
Findings
Achieves 3.86 to 10.91 times energy reduction compared to prior designs.
Attains peak energy efficiency of 52.90 TOPS/W (INT8).
Features hybrid attention unit, layer-fusion scheduler, and cascaded feature-map pruner.
Abstract
This work presents a 28nm 13.93mm2 CNN-Transformer accelerator for semantic segmentation, achieving 3.86-to-10.91x energy reduction over previous designs. It features a hybrid attention unit, layer-fusion scheduler, and cascaded feature-map pruner, with peak energy efficiency of 52.90TOPS/W (INT8).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Network Packet Processing and Optimization
