TL;DR
CRAFT introduces an on-policy framework for autonomous driving policy fine-tuning that combines counterfactual advantages with residual correction to improve closed-loop performance and stability.
Contribution
It formulates closed-loop policy fine-tuning as proxy-residual optimization, integrating counterfactual advantages with grounded residual correction for better adaptation.
Findings
CRAFT achieves the strongest closed-loop gains on Bench2Drive.
Ablation studies validate the roles of proxy and residual correction.
CRAFT demonstrates stability and transferability across architectures.
Abstract
Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future estimates. We introduce Counterfactual-to-Interactive Reinforcement Fine-Tuning (CRAFT), an on-policy framework that formulates closed-loop post-training as proxy-residual optimization. CRAFT uses group-normalized counterfactual advantages as a dense proxy for real closed-loop advantages and aligns this proxy with the closed-loop world through grounded residual correction from interaction-critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
