Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for   Ultra-High Resolution Segmentation

Deyi Ji; Feng Zhao; Hongtao Lu

arXiv:2307.00711·cs.CV·July 7, 2023

Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation

Deyi Ji, Feng Zhao, Hongtao Lu

PDF

Open Access

TL;DR

The paper introduces GPWFormer, a novel Transformer-CNN hybrid framework for ultra-high resolution segmentation that balances memory efficiency and local detail capture through dynamic patch grouping and multi-scale processing.

Contribution

It proposes a mutual learning framework with a lightweight Wavelet Transformer and patch grouping guided by CNN masks, improving UHR segmentation performance and efficiency.

Findings

01

Outperforms existing methods on five benchmark datasets.

02

Achieves high inference speed with low computational complexity.

03

Effectively captures both local details and long-range dependencies.

Abstract

Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer ( $T$ )-CNN ( $C$ ) mutual leaning framework, where $T$ takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while $C$ takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, $T$ partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection