Component-Aware Sketch-to-Image Generation Using Self-Attention Encoding and Coordinate-Preserving Fusion

Ali Zia; Muhammad Umer Ramzan; Usman Ali; Muhammad Faheem; Abdelwahed Khamis; Shahnawaz Qureshi

arXiv:2603.09484·cs.CV·March 11, 2026

Component-Aware Sketch-to-Image Generation Using Self-Attention Encoding and Coordinate-Preserving Fusion

Ali Zia, Muhammad Umer Ramzan, Usman Ali, Muhammad Faheem, Abdelwahed Khamis, Shahnawaz Qureshi

PDF

Open Access

TL;DR

This paper introduces a novel component-aware, two-stage framework for sketch-to-image generation that leverages self-attention and coordinate-preserving fusion to improve realism, semantic accuracy, and generalization across diverse datasets.

Contribution

It proposes a new architecture combining self-attention encoding, coordinate-preserving fusion, and iterative refinement, outperforming existing GAN and diffusion models in sketch-to-image synthesis.

Findings

01

Achieves 21% improvement in FID on CelebAMask-HQ

02

Outperforms state-of-the-art models in image fidelity and semantic accuracy

03

Demonstrates robustness across facial and non-facial datasets

Abstract

Translating freehand sketches into photorealistic images remains a fundamental challenge in image synthesis, particularly due to the abstract, sparse, and stylistically diverse nature of sketches. Existing approaches, including GAN-based and diffusion-based models, often struggle to reconstruct fine-grained details, maintain spatial alignment, or adapt across different sketch domains. In this paper, we propose a component-aware, self-refining framework for sketch-to-image generation that addresses these challenges through a novel two-stage architecture. A Self-Attention-based Autoencoder Network (SA2N) first captures localised semantic and structural features from component-wise sketch regions, while a Coordinate-Preserving Gated Fusion (CGF) module integrates these into a coherent spatial layout. Finally, a Spatially Adaptive Refinement Revisor (SARR), built on a modified StyleGAN2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Computer Graphics and Visualization Techniques