VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
Yibo Liu, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Bingbing Liu,, Yang Liu, Jinjun Shan

TL;DR
VQA-Diff introduces a zero-shot framework leveraging VQA and diffusion models to generate realistic 3D vehicle assets from in-the-wild images, addressing limitations of prior methods in real-world scenarios.
Contribution
The paper presents a novel zero-shot approach combining VQA and diffusion models for 3D vehicle generation without large-scale training data.
Findings
Outperforms state-of-the-art methods on multiple datasets
Demonstrates robust zero-shot prediction in real-world conditions
Achieves high-quality 3D vehicle asset generation
Abstract
Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world observations with occlusion or tricky viewing angles. To solve this problem, in this work, we propose VQA-Diff, a novel framework that leverages in-the-wild vehicle images to create photorealistic 3D vehicle assets for autonomous driving. VQA-Diff exploits the real-world knowledge inherited from the Large Language Model in the Visual Question Answering (VQA) model for robust zero-shot prediction and the rich image prior knowledge in the Diffusion model for structure and appearance generation. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Advanced X-ray and CT Imaging
MethodsDiffusion
