Language-Enhanced Latent Representations for Out-of-Distribution   Detection in Autonomous Driving

Zhenjiang Mao; Dong-You Jhong; Ao Wang; Ivan Ruchkin

arXiv:2405.01691·cs.CV·May 6, 2024

Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving

Zhenjiang Mao, Dong-You Jhong, Ao Wang, Ivan Ruchkin

PDF

Open Access

TL;DR

This paper introduces a language-enhanced approach for out-of-distribution detection in autonomous driving, leveraging multimodal models like CLIP to improve transparency and performance over traditional encoder-based methods.

Contribution

It proposes using cosine similarity of image and text representations from CLIP for more transparent and controllable OOD detection in autonomous driving.

Findings

01

Language-based representations outperform traditional vision encoder features.

02

Combining language and visual features improves detection accuracy.

03

The approach enhances transparency and user interpretability in OOD detection.

Abstract

Out-of-distribution (OOD) detection is essential in autonomous driving, to determine when learning-based components encounter unexpected inputs. Traditional detectors typically use encoder models with fixed settings, thus lacking effective human interaction capabilities. With the rise of large foundation models, multimodal inputs offer the possibility of taking human language as a latent representation, thus enabling language-defined OOD detection. In this paper, we use the cosine similarity of image and text representations encoded by the multimodal model CLIP as a new representation to improve the transparency and controllability of latent encodings used for visual anomaly detection. We compare our approach with existing pre-trained encoders that can only produce latent representations that are meaningless from the user's standpoint. Our experiments on realistic driving data show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques · Autonomous Vehicle Technology and Safety · Time Series Analysis and Forecasting

MethodsContrastive Language-Image Pre-training