Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances

Yuanzhi Liang; Yijie Fang; Ke Hao; Rui Li; Ziqi Ni; Ruijie Su; Chi Zhang

arXiv:2508.10316·cs.CV·January 21, 2026

Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances

Yuanzhi Liang, Yijie Fang, Ke Hao, Rui Li, Ziqi Ni, Ruijie Su, Chi Zhang

PDF

TL;DR

This paper surveys how reinforcement learning can be integrated with visual generative models to improve controllability, alignment with high-level goals, and realism in image, video, and 3D content creation.

Contribution

It provides a comprehensive overview of RL methods in visual generative modeling, highlighting recent advances and future challenges.

Findings

01

RL enhances controllability and semantic accuracy in generative models.

02

Integration of RL improves alignment with complex, high-level objectives.

03

RL serves as both a fine-tuning tool and a structural component in generation.

Abstract

Generative models have made significant progress in synthesizing visual content, including images, videos, and 3D/4D structures. However, they are typically trained with surrogate objectives such as likelihood or reconstruction loss, which often misalign with perceptual quality, semantic accuracy, or physical realism. Reinforcement learning (RL) offers a principled framework for optimizing non-differentiable, preference-driven, and temporally structured objectives. Recent advances demonstrate its effectiveness in enhancing controllability, consistency, and human alignment across generative tasks. This survey provides a systematic overview of RL-based methods for visual content generation. We review the evolution of RL from classical control to its role as a general-purpose optimization tool, and examine its integration into image, video, and 3D/4D generation. Across these domains, RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.