IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions
Zhijun Tu, Kunpeng Du, Hanting Chen, Hailing Wang, Wei Li, Jie Hu,, Yunhe Wang

TL;DR
IPT-V2 introduces a hierarchical attention-based transformer architecture that effectively captures both local and global dependencies, leading to state-of-the-art performance in multiple image restoration tasks and image generation.
Contribution
The paper proposes a novel hierarchical attention mechanism with focal context and global grid self-attentions, enhancing global-local dependency modeling in image processing transformers.
Findings
Achieves state-of-the-art results in denoising, deblurring, and deraining.
Provides a better trade-off between performance and computational complexity.
Outperforms previous methods in image generation tasks.
Abstract
Recent advances have demonstrated the powerful capability of transformer architecture in image restoration. However, our analysis indicates that existing transformerbased methods can not establish both exact global and local dependencies simultaneously, which are much critical to restore the details and missing content of degraded images. To this end, we present an efficient image processing transformer architecture with hierarchical attentions, called IPTV2, adopting a focal context self-attention (FCSA) and a global grid self-attention (GGSA) to obtain adequate token interactions in local and global receptive fields. Specifically, FCSA applies the shifted window mechanism into the channel self-attention, helps capture the local context and mutual interaction across channels. And GGSA constructs long-range dependencies in the cross-window grid, aggregates global information in spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Image and Signal Denoising Methods · Neural Networks and Applications
MethodsDiffusion
