Activating Wider Areas in Image Super-Resolution
Cheng Cheng, Hang Wang, Hongbin Sun

TL;DR
This paper explores the use of Vision Mamba, a state space model, for image super-resolution, introducing new techniques that improve performance and efficiency over existing CNN and ViT-based methods.
Contribution
It demonstrates how to effectively utilize Vision Mamba in SISR through integration, pre-training, and attention mechanisms, achieving competitive results with lower computational costs.
Findings
MMA achieves +0.5 dB PSNR on Manga109 dataset.
MMA maintains low memory and computational overhead.
Versatility of Vision Mamba in lightweight SR applications.
Abstract
The prevalence of convolution neural networks (CNNs) and vision transformers (ViTs) has markedly revolutionized the area of single-image super-resolution (SISR). To further boost the SR performances, several techniques, such as residual learning and attention mechanism, are introduced, which can be largely attributed to a wider range of activated area, that is, the input pixels that strongly influence the SR results. However, the possibility of further improving SR performance through another versatile vision backbone remains an unresolved challenge. To address this issue, in this paper, we unleash the representation potential of the modern state space model, i.e., Vision Mamba (Vim), in the context of SISR. Specifically, we present three recipes for better utilization of Vim-based models: 1) Integration into a MetaFormer-style block; 2) Pre-training on a larger and broader dataset; 3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications
MethodsConvolution
