TL;DR
This paper introduces GenAssets, a 3D latent diffusion model that generates diverse, high-quality 3D assets from in-the-wild LiDAR and camera data, improving simulation realism.
Contribution
It proposes a novel reconstruct-then-generate approach using occlusion-aware neural rendering and latent diffusion to produce complete 3D assets from limited in-the-wild data.
Findings
Outperforms existing reconstruction and generation methods.
Produces diverse, complete 3D assets suitable for simulation.
Leverages occlusion-aware neural rendering for high-quality latent space.
Abstract
High-quality 3D assets for traffic participants are critical for multi-sensor simulation, which is essential for the safe end-to-end development of autonomy. Building assets from in-the-wild data is key for diversity and realism, but existing neural-rendering based reconstruction methods are slow and generate assets that render well only from viewpoints close to the original observations, limiting their usefulness in simulation. Recent diffusion-based generative models build complete and diverse assets, but perform poorly on in-the-wild driving scenes, where observed actors are captured under sparse and limited fields of view, and are partially occluded. In this work, we propose a 3D latent diffusion model that learns on in-the-wild LiDAR and camera data captured by a sensor platform and generates high-quality 3D assets with complete geometry and appearance. Key to our method is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
