Image Generation Models: A Technical History

Rouzbeh Shirvani

arXiv:2603.07455·cs.CV·March 31, 2026

Image Generation Models: A Technical History

Rouzbeh Shirvani

PDF

TL;DR

This paper provides a comprehensive survey of key image generation models, detailing their architectures, training methods, limitations, and recent advances in video generation and responsible deployment.

Contribution

It offers a detailed technical overview of various image generation models and discusses recent developments and challenges in the field.

Findings

01

Detailed walkthrough of VAEs, GANs, flows, transformers, and diffusion models.

02

Analysis of failure modes, limitations, and robustness issues.

03

Overview of recent progress in video generation and deepfake detection.

Abstract

Image generation has advanced rapidly over the past decade, yet the literature seems fragmented across different models and application domains. This paper aims to offer a comprehensive survey of breakthrough image generation models, including variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows, autoregressive and transformer-based generators, and diffusion-based methods. We provide a detailed technical walkthrough of each model type, including their underlying objectives, architectural building blocks, and algorithmic training steps. For each model type, we present the optimization techniques as well as common failure modes and limitations. We also go over recent developments in video generation and present the research works that made it possible to go from still frames to high quality videos. Lastly, we cover the growing importance of robustness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.