Semantic Image Synthesis via Class-Adaptive Cross-Attention
Tomaso Fontanini, Claudio Ferrari, Giuseppe Lisanti, Massimo Bertozzi,, Andrea Prati

TL;DR
This paper introduces a novel semantic image synthesis method that replaces SPADE layers with class-adaptive cross-attention, improving global consistency and style transfer while maintaining high-quality image generation.
Contribution
The paper proposes a new architecture using cross-attention layers instead of SPADE for better shape-style correlation learning in semantic image synthesis.
Findings
Achieves state-of-the-art generation quality
Improves global style consistency and transfer
Enables shape manipulation without manual masks
Abstract
In semantic image synthesis the state of the art is dominated by methods that use customized variants of the SPatially-Adaptive DE-normalization (SPADE) layers, which allow for good visual generation quality and editing versatility. By design, such layers learn pixel-wise modulation parameters to de-normalize the generator activations based on the semantic class each pixel belongs to. Thus, they tend to overlook global image statistics, ultimately leading to unconvincing local style editing and causing global inconsistencies such as color or illumination distribution shifts. Also, SPADE layers require the semantic segmentation mask for mapping styles in the generator, preventing shape manipulations without manual intervention. In response, we designed a novel architecture where cross-attention layers are used in place of SPADE for learning shape-style correlations and so conditioning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
MethodsSpatially-Adaptive Normalization
