Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training
Hongzhi Ruan, Pei Liu, Weiliang Ma, Zhengning Li, Xueyang Zhang, Jun Ma, Dan Xu, Kun Zhan

TL;DR
This paper introduces AutoScale, a closed-loop data optimization framework for autonomous driving, which dynamically adjusts real and synthetic data mixtures to improve model performance efficiently.
Contribution
It proposes a novel automated data mixture optimization method using closed-loop feedback, scene representation, and sample retrieval for autonomous driving models.
Findings
AutoScale outperforms baseline co-training methods.
Achieves better performance with fewer synthetic samples.
Effective under constrained training budgets.
Abstract
Data scaling is fundamental to modern deep learning, and grows increasingly critical as autonomous driving shifts to end-to-end learning. Real-world driving data is expensive to annotate and scene-biased, making real-synthetic co-training with near-infinite synthetic data a promising direction. However, naively incorporating all available synthetic data is inefficient and leads to distribution shifts, and optimizing data mixture under practical training budgets remains a critical yet under-explored problem. In this sense, we claim that the mixture of training data requires clear guidance in terms of scene types and quantities. Particularly in this work, we conceptualize the data mixture approximately as a dynamic optimization process that iteratively adjusts the training data mixture to maximize model performance, guided by closed-loop evaluation feedback, and propose AutoScale, a fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
