MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability
Buyu Liu, Kai Wang, Yansong Liu, Jun Bao, Tingting Han, Jun Yu

TL;DR
MVPbev is a novel method for generating multi-view, photorealistic images from text prompts using BEV semantics, with test-time controllability and improved generalization to unseen views.
Contribution
The paper introduces MVPbev, a two-stage model that projects BEV semantics to perspective views and enforces cross-view consistency, enabling controllable and generalizable multi-view image generation from text.
Findings
Outperforms state-of-the-art on NuScenes dataset
Generates high-resolution photorealistic images from text
Demonstrates strong generalization to unseen viewpoints
Abstract
This work aims to address the multi-view perspective RGB generation from text prompts given Bird-Eye-View(BEV) semantics. Unlike prior methods that neglect layout consistency, lack the ability to handle detailed text prompts, or are incapable of generalizing to unseen view points, MVPbev simultaneously generates cross-view consistent images of different perspective views with a two-stage design, allowing object-level control and novel view generation at test-time. Specifically, MVPbev firstly projects given BEV semantics to perspective view with camera parameters, empowering the model to generalize to unseen view points. Then we introduce a multi-view attention module where special initialization and de-noising processes are introduced to explicitly enforce local consistency among overlapping views w.r.t. cross-view homography. Last but not least, MVPbev further allows test-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Medical Image Segmentation Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
