Generative AI for Vision: A Comprehensive Study of Frameworks and   Applications

Fouad Bousetouane

arXiv:2501.18033·cs.CV·January 31, 2025

Generative AI for Vision: A Comprehensive Study of Frameworks and Applications

Fouad Bousetouane

PDF

Open Access

TL;DR

This paper provides a comprehensive overview of generative AI techniques for vision, categorizing methods by input types, discussing key models, applications, and challenges to guide future research and practical use.

Contribution

It offers a structured classification of image generation methods based on input modalities and highlights recent frameworks and challenges in the field.

Findings

01

Classification of techniques by input modality

02

Analysis of key frameworks like DALL-E and ControlNet

03

Discussion of challenges such as computational costs and biases

Abstract

Generative AI is transforming image synthesis, enabling the creation of high-quality, diverse, and photorealistic visuals across industries like design, media, healthcare, and autonomous systems. Advances in techniques such as image-to-image translation, text-to-image generation, domain transfer, and multimodal alignment have broadened the scope of automated visual content creation, supporting a wide spectrum of applications. These advancements are driven by models like Generative Adversarial Networks (GANs), conditional frameworks, and diffusion-based approaches such as Stable Diffusion. This work presents a structured classification of image generation techniques based on the nature of the input, organizing methods by input modalities like noisy vectors, latent representations, and conditional inputs. We explore the principles behind these models, highlight key frameworks including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage

MethodsDiffusion