Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning
Antonio Guillen-Perez

TL;DR
This paper systematically compares data-centric strategies for offline reinforcement learning in autonomous vehicle planning, demonstrating that intelligent data curation significantly enhances safety and performance in real-world driving scenarios.
Contribution
It introduces a comprehensive evaluation of six criticality weighting schemes across different temporal scales, highlighting the importance of data curation for robust offline RL in autonomous motion planning.
Findings
Model uncertainty-based curation reduces collision rate by nearly three times.
Scenario-level weighting improves long-horizon planning safety.
Timestep-level weighting enhances reactive safety.
Abstract
Offline Reinforcement Learning (RL) presents a promising paradigm for training autonomous vehicle (AV) planning policies from large-scale, real-world driving logs. However, the extreme data imbalance in these logs, where mundane scenarios vastly outnumber rare "long-tail" events, leads to brittle and unsafe policies when using standard uniform data sampling. In this work, we address this challenge through a systematic, large-scale comparative study of data curation strategies designed to focus the learning process on information-rich samples. We investigate six distinct criticality weighting schemes which are categorized into three families: heuristic-based, uncertainty-based, and behavior-based. These are evaluated at two temporal scales, the individual timestep and the complete scenario. We train seven goal-conditioned Conservative Q-Learning (CQL) agents with a state-of-the-art,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
