Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning

Antonio Guillen-Perez

arXiv:2508.18397·cs.RO·September 18, 2025

Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning

Antonio Guillen-Perez

PDF

TL;DR

This paper systematically compares data-centric strategies for offline reinforcement learning in autonomous vehicle planning, demonstrating that intelligent data curation significantly enhances safety and performance in real-world driving scenarios.

Contribution

It introduces a comprehensive evaluation of six criticality weighting schemes across different temporal scales, highlighting the importance of data curation for robust offline RL in autonomous motion planning.

Findings

01

Model uncertainty-based curation reduces collision rate by nearly three times.

02

Scenario-level weighting improves long-horizon planning safety.

03

Timestep-level weighting enhances reactive safety.

Abstract

Offline Reinforcement Learning (RL) presents a promising paradigm for training autonomous vehicle (AV) planning policies from large-scale, real-world driving logs. However, the extreme data imbalance in these logs, where mundane scenarios vastly outnumber rare "long-tail" events, leads to brittle and unsafe policies when using standard uniform data sampling. In this work, we address this challenge through a systematic, large-scale comparative study of data curation strategies designed to focus the learning process on information-rich samples. We investigate six distinct criticality weighting schemes which are categorized into three families: heuristic-based, uncertainty-based, and behavior-based. These are evaluated at two temporal scales, the individual timestep and the complete scenario. We train seven goal-conditioned Conservative Q-Learning (CQL) agents with a state-of-the-art,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.