Generative AI for Vision: A Comprehensive Study of Frameworks and Applications
Fouad Bousetouane

TL;DR
This paper provides a comprehensive overview of generative AI techniques for vision, categorizing methods by input types, discussing key models, applications, and challenges to guide future research and practical use.
Contribution
It offers a structured classification of image generation methods based on input modalities and highlights recent frameworks and challenges in the field.
Findings
Classification of techniques by input modality
Analysis of key frameworks like DALL-E and ControlNet
Discussion of challenges such as computational costs and biases
Abstract
Generative AI is transforming image synthesis, enabling the creation of high-quality, diverse, and photorealistic visuals across industries like design, media, healthcare, and autonomous systems. Advances in techniques such as image-to-image translation, text-to-image generation, domain transfer, and multimodal alignment have broadened the scope of automated visual content creation, supporting a wide spectrum of applications. These advancements are driven by models like Generative Adversarial Networks (GANs), conditional frameworks, and diffusion-based approaches such as Stable Diffusion. This work presents a structured classification of image generation techniques based on the nature of the input, organizing methods by input modalities like noisy vectors, latent representations, and conditional inputs. We explore the principles behind these models, highlight key frameworks including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage
MethodsDiffusion
