HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution
Masoomeh Aslahishahri, Jordan Ubbens, Ian Stavness

TL;DR
HiTSR introduces a hierarchical transformer that effectively enhances low-resolution images using reference images, achieving state-of-the-art super-resolution results with a streamlined architecture and attention mechanisms.
Contribution
The paper presents a novel hierarchical transformer model for reference-based super-resolution that simplifies existing architectures and leverages attention mechanisms for improved performance.
Findings
Achieves state-of-the-art PSNR/SSIM on SUN80 dataset.
Outperforms existing methods across multiple datasets.
Utilizes attention mechanisms effectively without complex subnetworks.
Abstract
In this paper, we propose HiTSR, a hierarchical transformer model for reference-based image super-resolution, which enhances low-resolution input images by learning matching correspondences from high-resolution reference images. Diverging from existing multi-network, multi-stage approaches, we streamline the architecture and training pipeline by incorporating the double attention block from GAN literature. Processing two visual streams independently, we fuse self-attention and cross-attention blocks through a gating attention strategy. The model integrates a squeeze-and-excitation module to capture global context from the input images, facilitating long-range spatial interactions within window-based attention blocks. Long skip connections between shallow and deep layers further enhance information flow. Our model demonstrates superior performance across three datasets including SUN80,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Image Processing Techniques · Advanced Vision and Imaging
MethodsSoftmax · Attention Is All You Need · Low-resolution input
