AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration
Hongyi Cai, Mohammad Mahdinur Rahman, Mohammad Shahid Akhtar, Jie Li,, Jingyu Wu, Zhili Fang

TL;DR
AgileIR introduces a memory-efficient and faster attention mechanism for image restoration transformers by decomposing attention into groups, maintaining performance while significantly reducing memory usage.
Contribution
The paper proposes Group Shifted Window Attention (GSWA), a novel sparse attention method that reduces memory consumption and accelerates training in image restoration transformers.
Findings
AgileIR reduces memory usage by over 50% compared to baseline models.
Maintains high performance with 32.20 dB on Set5 dataset.
Speeds up training with negligible performance loss.
Abstract
Image Transformers show a magnificent success in Image Restoration tasks. Nevertheless, most of transformer-based models are strictly bounded by exorbitant memory occupancy. Our goal is to reduce the memory consumption of Swin Transformer and at the same time speed up the model during training process. Thus, we introduce AgileIR, group shifted attention mechanism along with window attention, which sparsely simplifies the model in architecture. We propose Group Shifted Window Attention (GSWA) to decompose Shift Window Multi-head Self Attention (SW-MSA) and Window Multi-head Self Attention (W-MSA) into groups across their attention heads, contributing to shrinking memory usage in back propagation. In addition to that, we keep shifted window masking and its shifted learnable biases during training, in order to induce the model interacting across windows within the channel. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Image Processing Techniques · Medical Imaging Techniques and Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Adam
