TL;DR
This paper introduces MV-AR, an auto-regressive method for generating multi-view images from prompts, ensuring view consistency and handling diverse conditions with novel training and data augmentation strategies.
Contribution
The paper presents a new auto-regressive approach for multi-view image synthesis, incorporating condition injection, progressive training, and Shuffle View data augmentation for improved performance.
Findings
MV-AR generates consistent multi-view images across various conditions.
The method performs comparably to leading diffusion-based models.
Shuffle View augmentation significantly expands training data.
Abstract
Generating multi-view images from human instructions is crucial for 3D content creation. The primary challenges involve maintaining consistency across multiple views and effectively synthesizing shapes and textures under diverse conditions. In this paper, we propose the Multi-View Auto-Regressive (\textbf{MV-AR}) method, which leverages an auto-regressive model to progressively generate consistent multi-view images from arbitrary prompts. Firstly, the next-token-prediction capability of the AR model significantly enhances its effectiveness in facilitating progressive multi-view synthesis. When generating widely-separated views, MV-AR can utilize all its preceding views to extract effective reference information. Subsequently, we propose a unified model that accommodates various prompts via architecture designing and training strategies. To address multiple conditions, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
