Image Generation Models: A Technical History
Rouzbeh Shirvani

TL;DR
This paper provides a comprehensive survey of key image generation models, detailing their architectures, training methods, limitations, and recent advances in video generation and responsible deployment.
Contribution
It offers a detailed technical overview of various image generation models and discusses recent developments and challenges in the field.
Findings
Detailed walkthrough of VAEs, GANs, flows, transformers, and diffusion models.
Analysis of failure modes, limitations, and robustness issues.
Overview of recent progress in video generation and deepfake detection.
Abstract
Image generation has advanced rapidly over the past decade, yet the literature seems fragmented across different models and application domains. This paper aims to offer a comprehensive survey of breakthrough image generation models, including variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows, autoregressive and transformer-based generators, and diffusion-based methods. We provide a detailed technical walkthrough of each model type, including their underlying objectives, architectural building blocks, and algorithmic training steps. For each model type, we present the optimization techniques as well as common failure modes and limitations. We also go over recent developments in video generation and present the research works that made it possible to go from still frames to high quality videos. Lastly, we cover the growing importance of robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
