SAT: Selective Aggregation Transformer for Image Super-Resolution
Dinh Phu Tran, Thao Do, Saad Wazir, Seongah Kim, Seon Kwon Kim, Daeyoung Kim

TL;DR
The paper introduces SAT, a transformer model for image super-resolution that efficiently captures long-range dependencies with reduced computational costs, outperforming previous methods in quality and efficiency.
Contribution
SAT employs a novel density-driven token aggregation to enlarge receptive fields while maintaining full resolution, reducing complexity and preserving high-frequency details.
Findings
SAT outperforms PFT by up to 0.22dB in super-resolution quality.
The model reduces FLOPs by up to 27% compared to prior methods.
Selective aggregation preserves critical high-frequency details.
Abstract
Transformer-based approaches have revolutionized image super-resolution by modeling long-range dependencies. However, the quadratic computational complexity of vanilla self-attention mechanisms poses significant challenges, often leading to compromises between efficiency and global context exploitation. Recent window-based attention methods mitigate this by localizing computations, but they often yield restricted receptive fields. To mitigate these limitations, we propose Selective Aggregation Transformer (SAT). This novel transformer efficiently captures long-range dependencies, leading to an enlarged model receptive field by selectively aggregating key-value matrices (reducing the number of tokens by 97\%) via our Density-driven Token Aggregation algorithm while maintaining the full resolution of the query matrix. This design significantly reduces computational costs, resulting in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
