Concept-based Explainable Data Mining with VLM for 3D Detection
Mai Tsujimoto

TL;DR
This paper introduces a cross-modal framework using vision-language models to identify rare objects in driving scenes, improving 3D detection performance with less data and enhancing autonomous driving safety.
Contribution
It presents a novel concept-based data mining method leveraging VLMs and outlier detection to efficiently identify and annotate rare objects in autonomous driving datasets.
Findings
Enhanced detection of rare objects like trailers and bicycles.
Significant performance gains with reduced training data.
Effective concept-guided data curation for autonomous systems.
Abstract
Rare-object detection remains a challenging task in autonomous driving systems, particularly when relying solely on point cloud data. Although Vision-Language Models (VLMs) exhibit strong capabilities in image understanding, their potential to enhance 3D object detection through intelligent data mining has not been fully explored. This paper proposes a novel cross-modal framework that leverages 2D VLMs to identify and mine rare objects from driving scenes, thereby improving 3D object detection performance. Our approach synthesizes complementary techniques such as object detection, semantic feature extraction, dimensionality reduction, and multi-faceted outlier detection into a cohesive, explainable pipeline that systematically identifies rare but critical objects in driving scenes. By combining Isolation Forest and t-SNE-based outlier detection methods with concept-based filtering, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)
