Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning

Ziyi Zhang; Li Shen; Deheng Ye; Yong Luo; Huangxuan Zhao; Meng Liu; Wei Yu; Lefei Zhang

arXiv:2505.20107·cs.LG·March 18, 2026

Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning

Ziyi Zhang, Li Shen, Deheng Ye, Yong Luo, Huangxuan Zhao, Meng Liu, Wei Yu, Lefei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MVC-ZigAL, a reinforcement learning framework tailored for few-step text-to-multiview diffusion models, improving view fidelity and consistency through joint modeling, advantage learning, and a dual optimization scheme.

Contribution

The paper presents a novel RL finetuning method specifically designed for few-step T2MV diffusion models, addressing cross-view coordination and learning signal challenges.

Findings

01

Significant improvements in per-view fidelity.

02

Enhanced cross-view consistency.

03

Effective RL finetuning with joint-view reward and advantage learning.

Abstract

Text-to-multiview (T2MV) diffusion models have shown great promise in generating multiple views of a scene from a single text prompt. While few-step backbones enable real-time T2MV generation, they often compromise key aspects of generation quality, such as per-view fidelity and cross-view consistency. Reinforcement learning (RL) finetuning offers a potential solution, yet existing approaches designed for single-image diffusion do not readily extend to the few-step T2MV setting, as they neglect cross-view coordination and suffer from weak learning signals in few-step regimes. To address this, we propose MVC-ZigAL, a tailored RL finetuning framework for few-step T2MV diffusion models. Specifically, its core insights are: (1) a new MDP formulation that jointly models all generated views and assesses their collective quality via a joint-view reward; (2) a novel advantage learning strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziyizhang27/mvc-zigal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization

MethodsDiffusion · Balanced Selection