Full Contextual Attention for Multi-resolution Transformers in Semantic   Segmentation

Loic Themyr; Clement Rambour; Nicolas Thome; Toby Collins; Alexandre; Hostettler

arXiv:2212.07890·cs.CV·December 16, 2022

Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Loic Themyr, Clement Rambour, Nicolas Thome, Toby Collins, Alexandre, Hostettler

PDF

Open Access 1 Video

TL;DR

This paper introduces GLAM, a global attention module for multi-resolution transformers that enhances semantic segmentation by modeling interactions across all image regions, leading to improved performance on multiple datasets.

Contribution

The paper proposes GLAM, a novel global token-based attention module that can be integrated into existing transformers to capture global interactions in semantic segmentation.

Findings

01

GLAM improves performance of Swin and UNet models on ADE20K and Cityscapes.

02

GLAM achieves state-of-the-art results on the BCV medical imaging dataset.

03

GLAM enhances segmentation of large 3D medical images.

Abstract

Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of global tokens to build GLobal Attention Multi-resolution (GLAM) transformers. GLAM is a generic module that can be integrated into most existing transformer backbones. GLAM includes learnable global tokens, which unlike previous methods can model interactions between all image regions, and extracts powerful representations during training. Extensive experiments show that GLAM-Swin or GLAM-Swin-UNet exhibit substantially better performances than their vanilla counterparts on ADE20K and Cityscapes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications