Cross-Loss Influence Functions to Explain Deep Network Representations

Andrew Silva; Rohit Chopra; and Matthew Gombolay

arXiv:2012.01685·cs.LG·May 5, 2022

Cross-Loss Influence Functions to Explain Deep Network Representations

Andrew Silva, Rohit Chopra, and Matthew Gombolay

PDF

Open Access 1 Repo

TL;DR

This paper extends influence functions to unsupervised and semi-supervised deep learning, enabling model explainability and bias detection in settings where training and testing objectives differ.

Contribution

We introduce the first theoretical and empirical method for estimating influence in cross-loss settings, broadening explainability tools beyond supervised learning.

Findings

01

Cross-loss influence estimates outperform traditional methods.

02

Method enables explanation of cluster memberships.

03

Identifies and mitigates biases in language models.

Abstract

As machine learning is increasingly deployed in the real world, it is paramount that we develop the tools necessary to analyze the decision-making of the models we train and deploy to end-users. Recently, researchers have shown that influence functions, a statistical measure of sample impact, can approximate the effects of training samples on classification accuracy for deep neural networks. However, this prior work only applies to supervised learning, where training and testing share an objective function. No approaches currently exist for estimating the influence of unsupervised training examples for deep learning models. To bring explainability to unsupervised and semi-supervised training regimes, we derive the first theoretical and empirical demonstration that influence functions can be extended to handle mismatched training and testing (i.e., "cross-loss") settings. Our formulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

core-robotics-lab/cross_loss_influence_functions
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications