Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization

Zhaoyang Liu; Weitao Zhou; Junze Wen; Cheng Jing; Qian Cheng; Kun Jiang; Diange Yang

arXiv:2512.19270·cs.RO·December 23, 2025

Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization

Zhaoyang Liu, Weitao Zhou, Junze Wen, Cheng Jing, Qian Cheng, Kun Jiang, Diange Yang

PDF

Open Access

TL;DR

This paper introduces an information-theoretic data pruning method for large-scale autonomous driving datasets that reduces data volume by up to 40% without sacrificing model performance, by selecting high-value samples based on trajectory entropy.

Contribution

It proposes a novel, model-agnostic data pruning approach based on trajectory entropy maximization, with theoretical guarantees on maintaining data distribution similarity.

Findings

01

Reduces dataset size by up to 40%

02

Maintains closed-loop performance in autonomous driving tasks

03

Provides a theoretically grounded data selection method

Abstract

Collecting large-scale naturalistic driving data is essential for training robust autonomous driving planners. However, real-world datasets often contain a substantial amount of repetitive and low-value samples, which lead to excessive storage costs and bring limited benefits to policy learning. To address this issue, we propose an information-theoretic data pruning method that effectively reduces the training data volume without compromising model performance. Our approach evaluates the trajectory distribution information entropy of driving data and iteratively selects high-value samples that preserve the statistical characteristics of the original dataset in a model-agnostic manner. From a theoretical perspective, we show that maximizing trajectory entropy effectively constrains the Kullback-Leibler divergence between the pruned subset and the original data distribution, thereby…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Reinforcement Learning in Robotics