Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion
Sanchayan Vivekananthan

TL;DR
This paper compares VAEs, GANs, and Stable Diffusion models for image synthesis, analyzing their strengths, limitations, and recent enhancements with segmentation and inpainting techniques to guide model selection.
Contribution
It provides a comprehensive comparison of major generative models and introduces improvements for Stable Diffusion using advanced segmentation and inpainting methods.
Findings
VAEs produce blurry images but learn latent representations.
GANs generate realistic images but suffer from mode collapse.
Stable Diffusion offers high-quality images with semantic coherence, enhanced by segmentation and inpainting.
Abstract
This paper examines three major generative modelling frameworks: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Stable Diffusion models. VAEs are effective at learning latent representations but frequently yield blurry results. GANs can generate realistic images but face issues such as mode collapse. Stable Diffusion models, while producing high-quality images with strong semantic coherence, are demanding in terms of computational resources. Additionally, the paper explores how incorporating Grounding DINO and Grounded SAM with Stable Diffusion improves image accuracy by utilising sophisticated segmentation and inpainting techniques. The analysis guides on selecting suitable models for various applications and highlights areas for further research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Dense Connections · Vision Transformer · self-DIstillation with NO labels · Diffusion
