LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution
Jeongsoo Kim, Jongho Nang, Junsuk Choe

TL;DR
LMLT introduces a multi-level vision transformer that efficiently captures local and global features for image super-resolution, reducing complexity and memory usage while maintaining or surpassing state-of-the-art performance.
Contribution
The paper proposes a novel multi-level transformer architecture that addresses window boundary issues and reduces computational complexity in image super-resolution tasks.
Findings
Significantly reduces inference time and GPU memory usage.
Maintains or exceeds the performance of existing ViT-based methods.
Effectively captures both local and global image features.
Abstract
Recent Vision Transformer (ViT)-based methods for Image Super-Resolution have demonstrated impressive performance. However, they suffer from significant complexity, resulting in high inference times and memory usage. Additionally, ViT models using Window Self-Attention (WSA) face challenges in processing regions outside their windows. To address these issues, we propose the Low-to-high Multi-Level Transformer (LMLT), which employs attention with varying feature sizes for each head. LMLT divides image features along the channel dimension, gradually reduces spatial size for lower heads, and applies self-attention to each head. This approach effectively captures both local and global information. By integrating the results from lower heads into higher heads, LMLT overcomes the window boundary issues in self-attention. Extensive experiments show that our model significantly reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Image Processing Techniques · CCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization
