Cross-Loss Influence Functions to Explain Deep Network Representations
Andrew Silva, Rohit Chopra, and Matthew Gombolay

TL;DR
This paper extends influence functions to unsupervised and semi-supervised deep learning, enabling model explainability and bias detection in settings where training and testing objectives differ.
Contribution
We introduce the first theoretical and empirical method for estimating influence in cross-loss settings, broadening explainability tools beyond supervised learning.
Findings
Cross-loss influence estimates outperform traditional methods.
Method enables explanation of cluster memberships.
Identifies and mitigates biases in language models.
Abstract
As machine learning is increasingly deployed in the real world, it is paramount that we develop the tools necessary to analyze the decision-making of the models we train and deploy to end-users. Recently, researchers have shown that influence functions, a statistical measure of sample impact, can approximate the effects of training samples on classification accuracy for deep neural networks. However, this prior work only applies to supervised learning, where training and testing share an objective function. No approaches currently exist for estimating the influence of unsupervised training examples for deep learning models. To bring explainability to unsupervised and semi-supervised training regimes, we derive the first theoretical and empirical demonstration that influence functions can be extended to handle mismatched training and testing (i.e., "cross-loss") settings. Our formulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
