Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving

Luke Rowe; Rodrigue de Schaetzen; Roger Girgis; Christopher Pal; Liam Paull

arXiv:2506.11234·cs.RO·November 7, 2025

Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving

Luke Rowe, Rodrigue de Schaetzen, Roger Girgis, Christopher Pal, Liam Paull

PDF

Open Access

TL;DR

This paper introduces Poutine, a scalable vision-language-trajectory pretraining and reinforcement learning approach that enables robust end-to-end autonomous driving without additional model components, achieving top performance on the Waymo benchmark.

Contribution

The work demonstrates that large vision-language models can be effectively adapted for autonomous driving through simple pretraining and lightweight RL fine-tuning, eliminating the need for handcrafted tokenizers or complex architectures.

Findings

01

Achieved 1st place in Waymo Challenge with RFS of 7.99

02

Scalable VLT pretraining improves driving robustness

03

Lightweight RL fine-tuning enhances performance in long-tail scenarios

Abstract

Maintaining good driving behavior in out-of-distribution scenarios remains a critical challenge in autonomous driving. A promising direction is to leverage the generalist knowledge and reasoning capabilities of large-language models by treating unusual driving scenarios as a logical reasoning task. In this work, we present Poutine, a method that uses an off-the-shelf 3B-parameter vision-language model (VLM) - without any additional components - to achieve robust end-to-end autonomous driving via a simple and scalable training recipe. To learn strong base driving capabilities, we first train Poutine-Base using self-supervised next-token prediction over vision, language, and trajectory (VLT) tokens, leveraging both nominal and long-tail driving data. In the second stage, we fine-tune Poutine-Base using Group Relative Policy Optimization (GRPO) with a small set of human preference-labeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Advanced Neural Network Applications · Vehicle License Plate Recognition

MethodsBalanced Selection