Comparative Analysis of Generative Models: Enhancing Image Synthesis   with VAEs, GANs, and Stable Diffusion

Sanchayan Vivekananthan

arXiv:2408.08751·cs.CV·August 19, 2024·6 cites

Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion

Sanchayan Vivekananthan

PDF

Open Access

TL;DR

This paper compares VAEs, GANs, and Stable Diffusion models for image synthesis, analyzing their strengths, limitations, and recent enhancements with segmentation and inpainting techniques to guide model selection.

Contribution

It provides a comprehensive comparison of major generative models and introduces improvements for Stable Diffusion using advanced segmentation and inpainting methods.

Findings

01

VAEs produce blurry images but learn latent representations.

02

GANs generate realistic images but suffer from mode collapse.

03

Stable Diffusion offers high-quality images with semantic coherence, enhanced by segmentation and inpainting.

Abstract

This paper examines three major generative modelling frameworks: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Stable Diffusion models. VAEs are effective at learning latent representations but frequently yield blurry results. GANs can generate realistic images but face issues such as mode collapse. Stable Diffusion models, while producing high-quality images with strong semantic coherence, are demanding in terms of computational resources. Additionally, the paper explores how incorporating Grounding DINO and Grounded SAM with Stable Diffusion improves image accuracy by utilising sophisticated segmentation and inpainting techniques. The analysis guides on selecting suitable models for various applications and highlights areas for further research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Dense Connections · Vision Transformer · self-DIstillation with NO labels · Diffusion