Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving
Zhenjiang Mao, Dong-You Jhong, Ao Wang, Ivan Ruchkin

TL;DR
This paper introduces a language-enhanced approach for out-of-distribution detection in autonomous driving, leveraging multimodal models like CLIP to improve transparency and performance over traditional encoder-based methods.
Contribution
It proposes using cosine similarity of image and text representations from CLIP for more transparent and controllable OOD detection in autonomous driving.
Findings
Language-based representations outperform traditional vision encoder features.
Combining language and visual features improves detection accuracy.
The approach enhances transparency and user interpretability in OOD detection.
Abstract
Out-of-distribution (OOD) detection is essential in autonomous driving, to determine when learning-based components encounter unexpected inputs. Traditional detectors typically use encoder models with fixed settings, thus lacking effective human interaction capabilities. With the rise of large foundation models, multimodal inputs offer the possibility of taking human language as a latent representation, thus enabling language-defined OOD detection. In this paper, we use the cosine similarity of image and text representations encoded by the multimodal model CLIP as a new representation to improve the transparency and controllability of latent encodings used for visual anomaly detection. We compare our approach with existing pre-trained encoders that can only produce latent representations that are meaningless from the user's standpoint. Our experiments on realistic driving data show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Autonomous Vehicle Technology and Safety · Time Series Analysis and Forecasting
MethodsContrastive Language-Image Pre-training
