From Noise to Nuance: Advances in Deep Generative Image Models

Benji Peng; Chia Xin Liang; Ziqian Bi; Ming Liu; Yichao Zhang,; Tianyang Wang; Keyu Chen; Xinyuan Song; Pohsun Feng

arXiv:2412.09656·cs.CV·December 16, 2024

From Noise to Nuance: Advances in Deep Generative Image Models

Benji Peng, Chia Xin Liang, Ziqian Bi, Ming Liu, Yichao Zhang,, Tianyang Wang, Keyu Chen, Xinyuan Song, Pohsun Feng

PDF

TL;DR

This paper reviews recent architectural and computational innovations in deep generative image models, highlighting advances in diffusion, transformers, and efficiency, while discussing ongoing challenges and future directions.

Contribution

It provides a comprehensive analysis of recent developments in deep generative image models, emphasizing architectural innovations and their impact on efficiency and quality.

Findings

01

Transformative impact of diffusion and transformer architectures

02

Enhanced multi-modal and zero-shot generation capabilities

03

Persistent challenges in resource-efficient and interpretable models

Abstract

Deep learning-based image generation has undergone a paradigm shift since 2021, marked by fundamental architectural breakthroughs and computational innovations. Through reviewing architectural innovations and empirical results, this paper analyzes the transition from traditional generative methods to advanced architectures, with focus on compute-efficient diffusion models and vision transformer architectures. We examine how recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis, while addressing persistent challenges in efficiency and quality. Our analysis focuses on the evolution of latent space representations, cross-attention mechanisms, and parameter-efficient training methodologies that enable accelerated inference under resource constraints. While more efficient training methods enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Consistency Models · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Diffusion