A 28nm 0.22{\mu}J/token memory-compute-intensity-aware CNN-Transformer accelerator with hybrid-attention-based layer-fusion and cascaded pruning for semantic-segmentation

Pingcheng Dong; Yonghao Tan; Xuejiao Liu; Peng Luo; Yu Liu; Luhong Liang; Yitong Zhou; Di Pang; Man-To Yung; Dong Zhang; Xijie Huang; Shih-Yang Liu; Yongkun Wu; Fengshi Tian; Chi-Ying Tsui; Fengbin Tu; Kwang-Ting Cheng

arXiv:2512.17555·eess.IV·January 5, 2026

A 28nm 0.22{\mu}J/token memory-compute-intensity-aware CNN-Transformer accelerator with hybrid-attention-based layer-fusion and cascaded pruning for semantic-segmentation

Pingcheng Dong, Yonghao Tan, Xuejiao Liu, Peng Luo, Yu Liu, Luhong Liang, Yitong Zhou, Di Pang, Man-To Yung, Dong Zhang, Xijie Huang, Shih-Yang Liu, Yongkun Wu, Fengshi Tian, Chi-Ying Tsui, Fengbin Tu, Kwang-Ting Cheng

PDF

Open Access

TL;DR

This paper introduces a 28nm CNN-Transformer accelerator optimized for semantic segmentation, combining hybrid attention, layer fusion, and cascaded pruning to significantly improve energy efficiency and reduce power consumption.

Contribution

It presents a novel 28nm accelerator with hybrid attention, layer-fusion, and cascaded pruning, achieving substantial energy savings for semantic segmentation tasks.

Findings

01

Achieves 3.86 to 10.91 times energy reduction compared to prior designs.

02

Attains peak energy efficiency of 52.90 TOPS/W (INT8).

03

Features hybrid attention unit, layer-fusion scheduler, and cascaded feature-map pruner.

Abstract

This work presents a 28nm 13.93mm2 CNN-Transformer accelerator for semantic segmentation, achieving 3.86-to-10.91x energy reduction over previous designs. It features a hybrid attention unit, layer-fusion scheduler, and cascaded feature-map pruner, with peak energy efficiency of 52.90TOPS/W (INT8).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Network Packet Processing and Optimization