Dissecting Representation Misalignment in Contrastive Learning via   Influence Function

Lijie Hu; Chenyang Ren; Huanyi Xie; Khouloud Saadi; Shu Yang; Zhen; Tan; Jingfeng Zhang; and Di Wang

arXiv:2411.11667·cs.LG·February 4, 2025

Dissecting Representation Misalignment in Contrastive Learning via Influence Function

Lijie Hu, Chenyang Ren, Huanyi Xie, Khouloud Saadi, Shu Yang, Zhen, Tan, Jingfeng Zhang, and Di Wang

PDF

Open Access

TL;DR

This paper introduces ECIF, an influence function tailored for contrastive learning, enabling efficient detection of data misalignments and improving interpretability of large-scale multimodal models like CLIP.

Contribution

The paper presents ECIF, a novel influence function for contrastive loss that considers both positive and negative samples, enhancing data valuation and model transparency.

Findings

01

ECIF provides accurate influence estimates without retraining.

02

ECIF improves detection of data misalignments in contrastive models.

03

Experimental results show ECIF outperforms baseline methods in interpretability.

Abstract

Contrastive learning, commonly applied in large-scale multimodal models, often relies on data from diverse and often unreliable sources, which can include misaligned or mislabeled text-image pairs. This frequently leads to robustness issues and hallucinations, ultimately causing performance degradation. Data valuation is an efficient way to detect and trace these misalignments. Nevertheless, existing methods are computationally expensive for large-scale models. Although computationally efficient, classical influence functions are inadequate for contrastive learning models, as they were initially designed for pointwise loss. Furthermore, contrastive learning involves minimizing the distance between positive sample modalities while maximizing the distance between negative sample modalities. This necessitates evaluating the influence of samples from both perspectives. To tackle these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsContrastive Learning