Contrast: A Hybrid Architecture of Transformers and State Space Models for Low-Level Vision
Aman Urumbekov, Zheng Chen

TL;DR
Contrast is a hybrid model combining transformers and state space architectures to improve image super-resolution by leveraging their complementary strengths, addressing limitations in context modeling and pixel accuracy.
Contribution
This paper introduces a novel hybrid architecture that integrates transformers and state space models for low-level vision tasks, enhancing super-resolution performance.
Findings
Improved super-resolution accuracy over existing models
Effective combination of global context and pixel-level detail
Mitigation of individual model limitations
Abstract
Transformers have become increasingly popular for image super-resolution (SR) tasks due to their strong global context modeling capabilities. However, their quadratic computational complexity necessitates the use of window-based attention mechanisms, which restricts the receptive field and limits effective context expansion. Recently, the Mamba architecture has emerged as a promising alternative with linear computational complexity, allowing it to avoid window mechanisms and maintain a large receptive field. Nevertheless, Mamba faces challenges in handling long-context dependencies when high pixel-level precision is required, as in SR tasks. This is due to its hidden state mechanism, which can compress and store a substantial amount of context but only in an approximate manner, leading to inaccuracies that transformers do not suffer from. In this paper, we propose \textbf{Contrast}, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
