When Text and Images Don't Mix: Bias-Correcting Language-Image   Similarity Scores for Anomaly Detection

Adam Goodge; Bryan Hooi; Wee Siong Ng

arXiv:2407.17083·cs.CV·July 25, 2024

When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection

Adam Goodge, Bryan Hooi, Wee Siong Ng

PDF

TL;DR

This paper identifies a bias in CLIP's similarity scores for text and images that hampers anomaly detection, and proposes BLISS, a simple method using external text inputs to correct this bias, improving detection performance.

Contribution

The paper uncovers a similarity bias in CLIP embeddings and introduces BLISS, a novel bias correction method that enhances anomaly detection without extensive training.

Findings

01

BLISS significantly outperforms baseline methods on benchmark datasets.

02

The similarity bias causes false negatives and positives in anomaly detection.

03

BLISS effectively corrects bias using external text inputs without heavy training.

Abstract

Contrastive Language-Image Pre-training (CLIP) achieves remarkable performance in various downstream tasks through the alignment of image and text input embeddings and holds great promise for anomaly detection. However, our empirical experiments show that the embeddings of text inputs unexpectedly tightly cluster together, far away from image embeddings, contrary to the model's contrastive training objective to align image-text input pairs. We show that this phenomenon induces a `similarity bias' - in which false negative and false positive errors occur due to bias in the similarities between images and the normal label text embeddings. To address this bias, we propose a novel methodology called BLISS which directly accounts for this similarity bias through the use of an auxiliary, external set of text inputs. BLISS is simple, it does not require strong inductive biases about anomalous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training · ALIGN