DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation
Songhua Liu, Jingwen Ye, Sucheng Ren, Xinchao Wang

TL;DR
DynaST introduces a dynamic sparse attention Transformer that achieves fine-grained, efficient exemplar-guided image generation by adaptively focusing on relevant tokens, outperforming existing methods in detail quality and computational cost.
Contribution
The paper proposes a novel dynamic attention mechanism within a Transformer for flexible, fine-level matching in exemplar-guided image generation, applicable to both supervised and unsupervised tasks.
Findings
Outperforms state-of-the-art in local detail preservation
Reduces computational cost significantly
Effective across multiple image translation applications
Abstract
One key challenge of exemplar-guided image generation lies in establishing fine-grained correspondences between input and guided images. Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility. In this paper, we propose a dynamic sparse attention based Transformer model, termed Dynamic Sparse Transformer (DynaST), to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Specifically, DynaST leverages the multi-layer nature of Transformer structure, and performs the dynamic attention scheme in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Pose and Action Recognition
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Byte Pair Encoding · Attention Dropout · Position-Wise Feed-Forward Layer · Adam · Residual Connection
