Sim2Real Diffusion: Leveraging Foundation Vision Language Models for Adaptive Automated Driving

Chinmay Vilas Samak; Tanmay Vilas Samak; Bing Li; Venkat Krovi

arXiv:2507.00236·cs.RO·November 21, 2025

Sim2Real Diffusion: Leveraging Foundation Vision Language Models for Adaptive Automated Driving

Chinmay Vilas Samak, Tanmay Vilas Samak, Bing Li, Venkat Krovi

PDF

TL;DR

This paper introduces a unified diffusion-based framework that enhances sim2real transfer for autonomous driving by leveraging foundation models, enabling adaptation across diverse conditions with limited data and real-time performance.

Contribution

It proposes a novel conditional latent diffusion approach for cross-domain adaptation in autonomous driving, integrating foundation models, few-shot fine-tuning, and multi-modal prompts.

Findings

01

Bridges perceptual sim2real gap by over 40%.

02

Achieves robust performance with limited examples.

03

Supports diverse domain conditions like weather and seasons.

Abstract

Simulation-based design, optimization, and validation of autonomous vehicles have proven to be crucial for their improvement over the years. Nevertheless, the ultimate measure of effectiveness is their successful transition from simulation to reality (sim2real). However, existing sim2real transfer methods struggle to address the autonomy-oriented requirements of balancing: (i) conditioned domain adaptation, (ii) robust performance with limited examples, (iii) modularity in handling multiple domain representations, and (iv) real-time performance. To alleviate these pain points, we present a unified framework for learning cross-domain adaptive representations through conditional latent diffusion for sim2real transferable automated driving. Our framework offers options to leverage: (i) alternate foundation models, (ii) a few-shot fine-tuning pipeline, and (iii) textual as well as image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.