Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation
Deyi Ji, Feng Zhao, Hongtao Lu

TL;DR
The paper introduces GPWFormer, a novel Transformer-CNN hybrid framework for ultra-high resolution segmentation that balances memory efficiency and local detail capture through dynamic patch grouping and multi-scale processing.
Contribution
It proposes a mutual learning framework with a lightweight Wavelet Transformer and patch grouping guided by CNN masks, improving UHR segmentation performance and efficiency.
Findings
Outperforms existing methods on five benchmark datasets.
Achieves high inference speed with low computational complexity.
Effectively captures both local details and long-range dependencies.
Abstract
Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer ()-CNN () mutual leaning framework, where takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection
