LMLT: Low-to-high Multi-Level Vision Transformer for Image   Super-Resolution

Jeongsoo Kim; Jongho Nang; Junsuk Choe

arXiv:2409.03516·cs.CV·September 6, 2024·2 cites

LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution

Jeongsoo Kim, Jongho Nang, Junsuk Choe

PDF

Open Access 1 Repo

TL;DR

LMLT introduces a multi-level vision transformer that efficiently captures local and global features for image super-resolution, reducing complexity and memory usage while maintaining or surpassing state-of-the-art performance.

Contribution

The paper proposes a novel multi-level transformer architecture that addresses window boundary issues and reduces computational complexity in image super-resolution tasks.

Findings

01

Significantly reduces inference time and GPU memory usage.

02

Maintains or exceeds the performance of existing ViT-based methods.

03

Effectively captures both local and global image features.

Abstract

Recent Vision Transformer (ViT)-based methods for Image Super-Resolution have demonstrated impressive performance. However, they suffer from significant complexity, resulting in high inference times and memory usage. Additionally, ViT models using Window Self-Attention (WSA) face challenges in processing regions outside their windows. To address these issues, we propose the Low-to-high Multi-Level Transformer (LMLT), which employs attention with varying feature sizes for each head. LMLT divides image features along the channel dimension, gradually reduces spatial size for lower heads, and applies self-attention to each head. This approach effectively captures both local and global information. By integrating the results from lower heads into higher heads, LMLT overcomes the window boundary issues in self-attention. Extensive experiments show that our model significantly reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jwgdmkj/lmlt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Image Processing Techniques · CCD and CMOS Imaging Sensors

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization