SAT: Selective Aggregation Transformer for Image Super-Resolution

Dinh Phu Tran; Thao Do; Saad Wazir; Seongah Kim; Seon Kwon Kim; Daeyoung Kim

arXiv:2604.07994·cs.CV·April 13, 2026

SAT: Selective Aggregation Transformer for Image Super-Resolution

Dinh Phu Tran, Thao Do, Saad Wazir, Seongah Kim, Seon Kwon Kim, Daeyoung Kim

PDF

TL;DR

The paper introduces SAT, a transformer model for image super-resolution that efficiently captures long-range dependencies with reduced computational costs, outperforming previous methods in quality and efficiency.

Contribution

SAT employs a novel density-driven token aggregation to enlarge receptive fields while maintaining full resolution, reducing complexity and preserving high-frequency details.

Findings

01

SAT outperforms PFT by up to 0.22dB in super-resolution quality.

02

The model reduces FLOPs by up to 27% compared to prior methods.

03

Selective aggregation preserves critical high-frequency details.

Abstract

Transformer-based approaches have revolutionized image super-resolution by modeling long-range dependencies. However, the quadratic computational complexity of vanilla self-attention mechanisms poses significant challenges, often leading to compromises between efficiency and global context exploitation. Recent window-based attention methods mitigate this by localizing computations, but they often yield restricted receptive fields. To mitigate these limitations, we propose Selective Aggregation Transformer (SAT). This novel transformer efficiently captures long-range dependencies, leading to an enlarged model receptive field by selectively aggregating key-value matrices (reducing the number of tokens by 97\%) via our Density-driven Token Aggregation algorithm while maintaining the full resolution of the query matrix. This design significantly reduces computational costs, resulting in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.