Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection
Tam\'as Matuszka, P\'eter Hajas, D\'avid Szeghy

TL;DR
This paper introduces a training-free, multimodal foundational approach combining obstacle segmentation and outlier detection to identify general obstacles in 3D for autonomous driving, overcoming dataset limitations.
Contribution
It presents a novel offline, training-free method that integrates multimodal foundational models with geometric outlier detection for 3D obstacle detection.
Findings
Effective detection of diverse obstacles in 3D without retraining
Leverages non-causal, offline processing for autonomous perception
New annotated dataset with various obstacles in distant regions
Abstract
Current autonomous driving perception models primarily rely on supervised learning with predefined categories. However, these models struggle to detect general obstacles not included in the fixed category set due to their variability and numerous edge cases. To address this issue, we propose a combination of multimodal foundational model-based obstacle segmentation with traditional unsupervised computational geometry-based outlier detection. Our approach operates offline, allowing us to leverage non-causality, and utilizes training-free methods. This enables the detection of general obstacles in 3D without the need for expensive retraining. To overcome the limitations of publicly available obstacle detection datasets, we collected and annotated our dataset, which includes various obstacles even in distant regions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
MethodsSparse Evolutionary Training
