HAFormer: Unleashing the Power of Hierarchy-Aware Features for   Lightweight Semantic Segmentation

Guoan Xu; Wenjing Jia; Tao Wu; Ligeng Chen; and Guangwei Gao

arXiv:2407.07441·cs.CV·July 12, 2024

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen, and Guangwei Gao

PDF

TL;DR

HAFormer is a lightweight semantic segmentation model that effectively combines hierarchical CNN features with Transformer-based global context modeling, achieving high accuracy and speed with minimal computational cost.

Contribution

The paper introduces HAFormer, integrating a Hierarchy-Aware Pixel-Excitation module, an Efficient Transformer, and a correlation-weighted Fusion module for improved lightweight segmentation.

Findings

01

Achieves 74.2% mIoU on Cityscapes with high frame rate

02

Outperforms existing lightweight models in accuracy and efficiency

03

Maintains low computational overhead with compact model size

Abstract

Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges. Specifically, we design a Hierarchy-Aware Pixel-Excitation (HAPE) module for adaptive multi-scale local feature extraction. During the global perception modeling, we devise an Efficient Transformer (ET) module streamlining the quadratic calculations associated with traditional Transformers. Moreover, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Adam · Dropout