Multimodal Foundational Models for Unsupervised 3D General Obstacle   Detection

Tam\'as Matuszka; P\'eter Hajas; D\'avid Szeghy

arXiv:2408.12322·cs.CV·August 23, 2024

Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection

Tam\'as Matuszka, P\'eter Hajas, D\'avid Szeghy

PDF

Open Access

TL;DR

This paper introduces a training-free, multimodal foundational approach combining obstacle segmentation and outlier detection to identify general obstacles in 3D for autonomous driving, overcoming dataset limitations.

Contribution

It presents a novel offline, training-free method that integrates multimodal foundational models with geometric outlier detection for 3D obstacle detection.

Findings

01

Effective detection of diverse obstacles in 3D without retraining

02

Leverages non-causal, offline processing for autonomous perception

03

New annotated dataset with various obstacles in distant regions

Abstract

Current autonomous driving perception models primarily rely on supervised learning with predefined categories. However, these models struggle to detect general obstacles not included in the fixed category set due to their variability and numerous edge cases. To address this issue, we propose a combination of multimodal foundational model-based obstacle segmentation with traditional unsupervised computational geometry-based outlier detection. Our approach operates offline, allowing us to leverage non-causality, and utilizes training-free methods. This enables the detection of general obstacles in 3D without the need for expensive retraining. To overcome the limitations of publicly available obstacle detection datasets, we collected and annotated our dataset, which includes various obstacles even in distant regions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods

MethodsSparse Evolutionary Training