Auto-Regressively Generating Multi-View Consistent Images

JiaKui Hu; Yuxiao Yang; Jialun Liu; Jinbo Wu; Chen Zhao; Yanye Lu

arXiv:2506.18527·cs.CV·July 15, 2025

Auto-Regressively Generating Multi-View Consistent Images

JiaKui Hu, Yuxiao Yang, Jialun Liu, Jinbo Wu, Chen Zhao, Yanye Lu

PDF

1 Models

TL;DR

This paper introduces MV-AR, an auto-regressive method for generating multi-view images from prompts, ensuring view consistency and handling diverse conditions with novel training and data augmentation strategies.

Contribution

The paper presents a new auto-regressive approach for multi-view image synthesis, incorporating condition injection, progressive training, and Shuffle View data augmentation for improved performance.

Findings

01

MV-AR generates consistent multi-view images across various conditions.

02

The method performs comparably to leading diffusion-based models.

03

Shuffle View augmentation significantly expands training data.

Abstract

Generating multi-view images from human instructions is crucial for 3D content creation. The primary challenges involve maintaining consistency across multiple views and effectively synthesizing shapes and textures under diverse conditions. In this paper, we propose the Multi-View Auto-Regressive (\textbf{MV-AR}) method, which leverages an auto-regressive model to progressively generate consistent multi-view images from arbitrary prompts. Firstly, the next-token-prediction capability of the AR model significantly enhances its effectiveness in facilitating progressive multi-view synthesis. When generating widely-separated views, MV-AR can utilize all its preceding views to extract effective reference information. Subsequently, we propose a unified model that accommodates various prompts via architecture designing and training strategies. To address multiple conditions, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Jiakui/MV-AR
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.