Where Did Your Model Learn That? Label-free Influence for   Self-supervised Learning

Nidhin Harilal; Amit Kiran Rege; Reza Akbarian Bafghi; Maziar Raissi,; Claire Monteleoni

arXiv:2412.17170·cs.LG·December 24, 2024

Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning

Nidhin Harilal, Amit Kiran Rege, Reza Akbarian Bafghi, Maziar Raissi,, Claire Monteleoni

PDF

Open Access

TL;DR

This paper introduces Influence-SSL, a label-free influence function for self-supervised learning that explains model predictions by analyzing data stability without labels, revealing insights into SSL models' data dependencies.

Contribution

The paper proposes Influence-SSL, a novel influence function tailored for SSL that operates without labels, bridging a gap in understanding data contributions in self-supervised models.

Findings

01

Influence-SSL effectively identifies influential training examples in SSL models.

02

SSL models respond differently to influential data compared to supervised models.

03

Applications include duplicate detection, outlier detection, and fairness analysis.

Abstract

Self-supervised learning (SSL) has revolutionized learning from large-scale unlabeled datasets, yet the intrinsic relationship between pretraining data and the learned representations remains poorly understood. Traditional supervised learning benefits from gradient-based data attribution tools like influence functions that measure the contribution of an individual data point to model predictions. However, existing definitions of influence rely on labels, making them unsuitable for SSL settings. We address this gap by introducing Influence-SSL, a novel and label-free approach for defining influence functions tailored to SSL. Our method harnesses the stability of learned representations against data augmentations to identify training examples that help explain model predictions. We provide both theoretical foundations and empirical evidence to show the utility of Influence-SSL in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling