Sim2Real Diffusion: Leveraging Foundation Vision Language Models for Adaptive Automated Driving
Chinmay Vilas Samak, Tanmay Vilas Samak, Bing Li, Venkat Krovi

TL;DR
This paper introduces a unified diffusion-based framework that enhances sim2real transfer for autonomous driving by leveraging foundation models, enabling adaptation across diverse conditions with limited data and real-time performance.
Contribution
It proposes a novel conditional latent diffusion approach for cross-domain adaptation in autonomous driving, integrating foundation models, few-shot fine-tuning, and multi-modal prompts.
Findings
Bridges perceptual sim2real gap by over 40%.
Achieves robust performance with limited examples.
Supports diverse domain conditions like weather and seasons.
Abstract
Simulation-based design, optimization, and validation of autonomous vehicles have proven to be crucial for their improvement over the years. Nevertheless, the ultimate measure of effectiveness is their successful transition from simulation to reality (sim2real). However, existing sim2real transfer methods struggle to address the autonomy-oriented requirements of balancing: (i) conditioned domain adaptation, (ii) robust performance with limited examples, (iii) modularity in handling multiple domain representations, and (iv) real-time performance. To alleviate these pain points, we present a unified framework for learning cross-domain adaptive representations through conditional latent diffusion for sim2real transferable automated driving. Our framework offers options to leverage: (i) alternate foundation models, (ii) a few-shot fine-tuning pipeline, and (iii) textual as well as image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
