Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving
Ran Tian, Boyi Li, Xinshuo Weng, Yuxiao Chen, Edward Schmerling, Yue, Wang, Boris Ivanovic, and Marco Pavone

TL;DR
This paper introduces TOKEN, a multi-modal large language model that tokenizes object-level knowledge to improve autonomous driving in rare, long-tail scenarios, significantly reducing errors and collisions.
Contribution
We propose TOKEN, a novel MM-LLM that enhances autonomous vehicle planning by leveraging object-level scene representations and reasoning alignment to address long-tail event challenges.
Findings
27% reduction in trajectory L2 error
39% decrease in collision rates
Outperforms existing frameworks in long-tail scenarios
Abstract
The autonomous driving industry is increasingly adopting end-to-end learning from sensory inputs to minimize human biases in system design. Traditional end-to-end driving models, however, suffer from long-tail events due to rare or unseen inputs within their training distributions. To address this, we propose TOKEN, a novel Multi-Modal Large Language Model (MM-LLM) that tokenizes the world into object-level knowledge, enabling better utilization of LLM's reasoning capabilities to enhance autonomous vehicle planning in long-tail scenarios. TOKEN effectively alleviates data scarcity and inefficient tokenization by leveraging a traditional end-to-end driving model to produce condensed and semantically enriched representations of the scene, which are optimized for LLM planning compatibility through deliberate representation and reasoning alignment training stages. Our results demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
