On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning
Amirhossein Afsharrad, Amirhesam Abedsoltan, Ahmadreza Moradipari, Sanjay Lall

TL;DR
This paper explores on-policy knowledge distillation from large language models to smaller models for autonomous vehicle motion planning, achieving near-teacher performance with significantly reduced model size.
Contribution
It introduces an on-policy generalized knowledge distillation method that outperforms reinforcement learning baselines in training smaller, deployable LLMs for autonomous driving.
Findings
GKD outperforms RL baseline in experiments.
Smaller models achieve performance close to large teachers.
On-policy distillation is effective for deploying LLMs in autonomous vehicles.
Abstract
Large language models (LLMs) have recently demonstrated strong potential for autonomous vehicle motion planning by reformulating trajectory prediction as a language generation problem. However, deploying capable LLMs in resource-constrained onboard systems remains a fundamental challenge. In this paper, we study how to effectively transfer motion planning knowledge from a large teacher LLM to a smaller, more deployable student model. We build on the GPT-Driver framework, which represents driving scenes as language prompts and generates waypoint trajectories with chain-of-thought reasoning, and investigate two student training paradigms: (i) on-policy generalized knowledge distillation (GKD), which trains the student on its own self-generated outputs using dense token-level feedback from the teacher, and (ii) a dense-feedback reinforcement learning (RL) baseline that uses the teacher's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
