Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning
Ziyi Zhang, Li Shen, Deheng Ye, Yong Luo, Huangxuan Zhao, Meng Liu, Wei Yu, Lefei Zhang

TL;DR
This paper introduces MVC-ZigAL, a reinforcement learning framework tailored for few-step text-to-multiview diffusion models, improving view fidelity and consistency through joint modeling, advantage learning, and a dual optimization scheme.
Contribution
The paper presents a novel RL finetuning method specifically designed for few-step T2MV diffusion models, addressing cross-view coordination and learning signal challenges.
Findings
Significant improvements in per-view fidelity.
Enhanced cross-view consistency.
Effective RL finetuning with joint-view reward and advantage learning.
Abstract
Text-to-multiview (T2MV) diffusion models have shown great promise in generating multiple views of a scene from a single text prompt. While few-step backbones enable real-time T2MV generation, they often compromise key aspects of generation quality, such as per-view fidelity and cross-view consistency. Reinforcement learning (RL) finetuning offers a potential solution, yet existing approaches designed for single-image diffusion do not readily extend to the few-step T2MV setting, as they neglect cross-view coordination and suffer from weak learning signals in few-step regimes. To address this, we propose MVC-ZigAL, a tailored RL finetuning framework for few-step T2MV diffusion models. Specifically, its core insights are: (1) a new MDP formulation that jointly models all generated views and assesses their collective quality via a joint-view reward; (2) a novel advantage learning strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization
MethodsDiffusion · Balanced Selection
