VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle   Asset Generation in Autonomous Driving

Yibo Liu; Zheyuan Yang; Guile Wu; Yuan Ren; Kejian Lin; Bingbing Liu,; Yang Liu; Jinjun Shan

arXiv:2407.06516·cs.CV·July 12, 2024·1 cites

VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

Yibo Liu, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Bingbing Liu,, Yang Liu, Jinjun Shan

PDF

Open Access

TL;DR

VQA-Diff introduces a zero-shot framework leveraging VQA and diffusion models to generate realistic 3D vehicle assets from in-the-wild images, addressing limitations of prior methods in real-world scenarios.

Contribution

The paper presents a novel zero-shot approach combining VQA and diffusion models for 3D vehicle generation without large-scale training data.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Demonstrates robust zero-shot prediction in real-world conditions

03

Achieves high-quality 3D vehicle asset generation

Abstract

Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world observations with occlusion or tricky viewing angles. To solve this problem, in this work, we propose VQA-Diff, a novel framework that leverages in-the-wild vehicle images to create photorealistic 3D vehicle assets for autonomous driving. VQA-Diff exploits the real-world knowledge inherited from the Large Language Model in the Visual Question Answering (VQA) model for robust zero-shot prediction and the rich image prior knowledge in the Diffusion model for structure and appearance generation. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Advanced X-ray and CT Imaging

MethodsDiffusion