Component-Aware Sketch-to-Image Generation Using Self-Attention Encoding and Coordinate-Preserving Fusion
Ali Zia, Muhammad Umer Ramzan, Usman Ali, Muhammad Faheem, Abdelwahed Khamis, Shahnawaz Qureshi

TL;DR
This paper introduces a novel component-aware, two-stage framework for sketch-to-image generation that leverages self-attention and coordinate-preserving fusion to improve realism, semantic accuracy, and generalization across diverse datasets.
Contribution
It proposes a new architecture combining self-attention encoding, coordinate-preserving fusion, and iterative refinement, outperforming existing GAN and diffusion models in sketch-to-image synthesis.
Findings
Achieves 21% improvement in FID on CelebAMask-HQ
Outperforms state-of-the-art models in image fidelity and semantic accuracy
Demonstrates robustness across facial and non-facial datasets
Abstract
Translating freehand sketches into photorealistic images remains a fundamental challenge in image synthesis, particularly due to the abstract, sparse, and stylistically diverse nature of sketches. Existing approaches, including GAN-based and diffusion-based models, often struggle to reconstruct fine-grained details, maintain spatial alignment, or adapt across different sketch domains. In this paper, we propose a component-aware, self-refining framework for sketch-to-image generation that addresses these challenges through a novel two-stage architecture. A Self-Attention-based Autoencoder Network (SA2N) first captures localised semantic and structural features from component-wise sketch regions, while a Coordinate-Preserving Gated Fusion (CGF) module integrates these into a coherent spatial layout. Finally, a Spatially Adaptive Refinement Revisor (SARR), built on a modified StyleGAN2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Computer Graphics and Visualization Techniques
