Improving Semantic Segmentation in Transformers using Hierarchical   Inter-Level Attention

Gary Leung; Jun Gao; Xiaohui Zeng; Sanja Fidler

arXiv:2207.02126·cs.CV·July 6, 2022

Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler

PDF

Open Access

TL;DR

This paper introduces Hierarchical Inter-Level Attention (HILA), a novel attention mechanism that enhances transformer-based image segmentation by enabling bidirectional feature updates across different levels, improving boundary localization and semantic understanding.

Contribution

HILA extends hierarchical vision transformers with local inter-level connections, allowing iterative bottom-up and top-down feature updates without altering the base architecture.

Findings

01

Improves semantic segmentation accuracy on benchmark datasets.

02

Reduces parameters and FLOPS compared to existing methods.

03

Easily integrates into popular hierarchical transformer architectures.

Abstract

Existing transformer-based image backbones typically propagate feature information in one direction from lower to higher-levels. This may not be ideal since the localization ability to delineate accurate object boundaries, is most prominent in the lower, high-resolution feature maps, while the semantics that can disambiguate image signals belonging to one object vs. another, typically emerges in a higher level of processing. We present Hierarchical Inter-Level Attention (HILA), an attention-based method that captures Bottom-Up and Top-Down Updates between features of different levels. HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder. In each iteration, we construct a hierarchy by having higher-level features compete for assignments to update lower-level features belonging to them,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Refunds@Expedia|||How do I get a full refund from Expedia? · Position-Wise Feed-Forward Layer · Label Smoothing · Adam · Dropout · Layer Normalization · Convolution