From Noise to Nuance: Advances in Deep Generative Image Models
Benji Peng, Chia Xin Liang, Ziqian Bi, Ming Liu, Yichao Zhang,, Tianyang Wang, Keyu Chen, Xinyuan Song, Pohsun Feng

TL;DR
This paper reviews recent architectural and computational innovations in deep generative image models, highlighting advances in diffusion, transformers, and efficiency, while discussing ongoing challenges and future directions.
Contribution
It provides a comprehensive analysis of recent developments in deep generative image models, emphasizing architectural innovations and their impact on efficiency and quality.
Findings
Transformative impact of diffusion and transformer architectures
Enhanced multi-modal and zero-shot generation capabilities
Persistent challenges in resource-efficient and interpretable models
Abstract
Deep learning-based image generation has undergone a paradigm shift since 2021, marked by fundamental architectural breakthroughs and computational innovations. Through reviewing architectural innovations and empirical results, this paper analyzes the transition from traditional generative methods to advanced architectures, with focus on compute-efficient diffusion models and vision transformer architectures. We examine how recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis, while addressing persistent challenges in efficiency and quality. Our analysis focuses on the evolution of latent space representations, cross-attention mechanisms, and parameter-efficient training methodologies that enable accelerated inference under resource constraints. While more efficient training methods enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Consistency Models · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Diffusion
