Modeling and Measuring Redundancy in Multisource Multimodal Data for Autonomous Driving
Yuhan Zhou, Mehri Sattari, Haihua Chen, Kewei Sha

TL;DR
This paper investigates redundancy in multisource and multimodal data for autonomous vehicles, modeling and measuring it to improve data quality and object detection performance.
Contribution
It introduces methods to quantify and analyze redundancy in AV datasets, demonstrating how removing redundant labels can enhance object detection accuracy.
Findings
Removing redundant multisource labels improves YOLOv8 detection accuracy.
Significant redundancy exists between image and LiDAR data.
Redundancy measurement can guide data quality improvements in AV datasets.
Abstract
Next-generation autonomous vehicles (AVs) rely on large volumes of multisource and multimodal () data to support real-time decision-making. In practice, data quality (DQ) varies across sources and modalities due to environmental conditions and sensor limitations, yet AV research has largely prioritized algorithm design over DQ analysis. This work focuses on redundancy as a fundamental but underexplored DQ issue in AV datasets. Using the nuScenes and Argoverse 2 (AV2) datasets, we model and measure redundancy in multisource camera data and multimodal image-LiDAR data, and evaluate how removing redundant labels affects the YOLOv8 object detection task. Experimental results show that selectively removing redundant multisource image object labels from cameras with shared fields of view improves detection. In nuScenes, mAP gains from to , to , and from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning
