Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
Ming Cheng, Haoyu Ma, Qiufang Ma, Xiaopeng Sun, Weiqi Li, Zhenyu, Zhang, Xuhan Sheng, Shijie Zhao, Junlin Li, Li Zhang

TL;DR
This paper introduces HTCAN, a hybrid transformer and CNN network for stereo image super-resolution, effectively leveraging stereo information and advanced training strategies to outperform existing methods.
Contribution
The paper proposes a novel hybrid network combining transformers and CNNs for stereo super-resolution, addressing limitations of existing transformer-based methods.
Findings
Achieved 23.90dB PSNR in NTIRE 2023 challenge
Outperformed existing stereo super-resolution methods
Utilized multi-patch training and larger window sizes
Abstract
Multi-stage strategies are frequently employed in image restoration tasks. While transformer-based methods have exhibited high efficiency in single-image super-resolution tasks, they have not yet shown significant advantages over CNN-based methods in stereo super-resolution tasks. This can be attributed to two key factors: first, current single-image super-resolution transformers are unable to leverage the complementary stereo information during the process; second, the performance of transformers is typically reliant on sufficient data, which is absent in common stereo-image super-resolution algorithms. To address these issues, we propose a Hybrid Transformer and CNN Attention Network (HTCAN), which utilizes a transformer-based network for single-image enhancement and a CNN-based network for stereo information fusion. Furthermore, we employ a multi-patch training strategy and larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Advanced Vision and Imaging
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Dense Connections · Residual Connection · Adam
