Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning
Nidhin Harilal, Amit Kiran Rege, Reza Akbarian Bafghi, Maziar Raissi,, Claire Monteleoni

TL;DR
This paper introduces Influence-SSL, a label-free influence function for self-supervised learning that explains model predictions by analyzing data stability without labels, revealing insights into SSL models' data dependencies.
Contribution
The paper proposes Influence-SSL, a novel influence function tailored for SSL that operates without labels, bridging a gap in understanding data contributions in self-supervised models.
Findings
Influence-SSL effectively identifies influential training examples in SSL models.
SSL models respond differently to influential data compared to supervised models.
Applications include duplicate detection, outlier detection, and fairness analysis.
Abstract
Self-supervised learning (SSL) has revolutionized learning from large-scale unlabeled datasets, yet the intrinsic relationship between pretraining data and the learned representations remains poorly understood. Traditional supervised learning benefits from gradient-based data attribution tools like influence functions that measure the contribution of an individual data point to model predictions. However, existing definitions of influence rely on labels, making them unsuitable for SSL settings. We address this gap by introducing Influence-SSL, a novel and label-free approach for defining influence functions tailored to SSL. Our method harnesses the stability of learned representations against data augmentations to identify training examples that help explain model predictions. We provide both theoretical foundations and empirical evidence to show the utility of Influence-SSL in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
