MVPbev: Multi-view Perspective Image Generation from BEV with Test-time   Controllability and Generalizability

Buyu Liu; Kai Wang; Yansong Liu; Jun Bao; Tingting Han; Jun Yu

arXiv:2407.19468·cs.CV·July 30, 2024

MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability

Buyu Liu, Kai Wang, Yansong Liu, Jun Bao, Tingting Han, Jun Yu

PDF

Open Access 1 Repo

TL;DR

MVPbev is a novel method for generating multi-view, photorealistic images from text prompts using BEV semantics, with test-time controllability and improved generalization to unseen views.

Contribution

The paper introduces MVPbev, a two-stage model that projects BEV semantics to perspective views and enforces cross-view consistency, enabling controllable and generalizable multi-view image generation from text.

Findings

01

Outperforms state-of-the-art on NuScenes dataset

02

Generates high-resolution photorealistic images from text

03

Demonstrates strong generalization to unseen viewpoints

Abstract

This work aims to address the multi-view perspective RGB generation from text prompts given Bird-Eye-View(BEV) semantics. Unlike prior methods that neglect layout consistency, lack the ability to handle detailed text prompts, or are incapable of generalizing to unseen view points, MVPbev simultaneously generates cross-view consistent images of different perspective views with a two-stage design, allowing object-level control and novel view generation at test-time. Specifically, MVPbev firstly projects given BEV semantics to perspective view with camera parameters, empowering the model to generalize to unseen view points. Then we introduce a multi-view attention module where special initialization and de-noising processes are introduced to explicitly enforce local consistency among overlapping views w.r.t. cross-view homography. Last but not least, MVPbev further allows test-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kkaiwwana/mvpbev
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Medical Image Segmentation Techniques

MethodsSoftmax · Attention Is All You Need · Diffusion