TL;DR
HorizonWeaver is a novel framework for realistic, controllable editing of complex driving scenes using language guidance, addressing multi-level semantics and domain shifts with a large-scale dataset and specialized models.
Contribution
It introduces a comprehensive approach combining large-scale data, language-guided masks, and joint training losses for scalable, high-fidelity scene editing in autonomous driving.
Findings
Outperforms prior methods in L1, CLIP, and DINO metrics.
Achieves +46.4% user preference improvement.
Improves BEV segmentation IoU by +33%.
Abstract
Ensuring safety in autonomous driving requires scalable generation of realistic, controllable driving scenes beyond what real-world testing provides. Yet existing instruction guided image editors, trained on object-centric or artistic data, struggle with dense, safety-critical driving layouts. We propose HorizonWeaver, which tackles three fundamental challenges in driving scene editing: (1) multi-level granularity, requiring coherent object- and scene-level edits in dense environments; (2) rich high-level semantics, preserving diverse objects while following detailed instructions; and (3) ubiquitous domain shifts, handling changes in climate, layout, and traffic across unseen environments. The core of HorizonWeaver is a set of complementary contributions across data, model, and training: (1) Data: Large-scale dataset generation, where we build a paired real/synthetic dataset from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
