Texture Image Synthesis Using Spatial GAN Based on Vision Transformers
Elahe Salari, Zohreh Azimifar

TL;DR
This paper introduces ViT-SGAN, a hybrid model combining Vision Transformers and Spatial GANs, which significantly improves the quality and diversity of texture synthesis by capturing complex spatial dependencies.
Contribution
The paper presents a novel hybrid model that integrates texture descriptors into Vision Transformers within a GAN framework for advanced texture synthesis.
Findings
Superior texture quality over state-of-the-art models
Effective capture of complex spatial dependencies
Demonstrates substantial improvements in FID, IS, SSIM, LPIPS metrics
Abstract
Texture synthesis is a fundamental task in computer vision, whose goal is to generate visually realistic and structurally coherent textures for a wide range of applications, from graphics to scientific simulations. While traditional methods like tiling and patch-based techniques often struggle with complex textures, recent advancements in deep learning have transformed this field. In this paper, we propose ViT-SGAN, a new hybrid model that fuses Vision Transformers (ViTs) with a Spatial Generative Adversarial Network (SGAN) to address the limitations of previous methods. By incorporating specialized texture descriptors such as mean-variance (mu, sigma) and textons into the self-attention mechanism of ViTs, our model achieves superior texture synthesis. This approach enhances the model's capacity to capture complex spatial dependencies, leading to improved texture quality that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Image Processing and 3D Reconstruction · Generative Adversarial Networks and Image Synthesis
