S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chiyu Zhang, Xiaogang Xu, Lei Wang, Zaiyan Dai, Jun Yang

TL;DR
S2WAT introduces a hierarchical vision transformer with diverse window attention and adaptive merging for improved image style transfer, effectively capturing both local and global dependencies.
Contribution
The paper proposes a novel hierarchical vision transformer architecture with diverse window shapes and an adaptive attention merge strategy for enhanced style transfer.
Findings
Outperforms state-of-the-art transformer-based style transfer methods
Effectively captures both short- and long-range dependencies
Demonstrates superior visual quality on benchmark datasets
Abstract
Transformer's recent integration into style transfer leverages its proficiency in establishing long-range dependencies, albeit at the expense of attenuated local modeling. This paper introduces Strips Window Attention Transformer (S2WAT), a novel hierarchical vision transformer designed for style transfer. S2WAT employs attention computation in diverse window shapes to capture both short- and long-range dependencies. The merged dependencies utilize the "Attn Merge" strategy, which adaptively determines spatial weights based on their relevance to the target. Extensive experiments on representative datasets show the proposed method's effectiveness compared to state-of-the-art (SOTA) transformer-based and other approaches. The code and pre-trained models are available at https://github.com/AlienZhang1996/S2WAT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Music and Audio Processing
MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Adam · Label Smoothing · Convolution · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding
