Supervising Feature Influence
Shayak Sen, Piotr Mardziel, Anupam Datta, Matthew Fredrikson

TL;DR
This paper addresses the challenge of accurately measuring feature influence in classifiers, proposing a novel active learning method that constrains causal influence to improve out-of-distribution generalization and interpretability.
Contribution
It introduces a new active learning algorithm that constrains feature influence measures, ensuring models have similar causal influences and better out-of-distribution performance.
Findings
Models trained with the proposed method have causal influences close to the labeler's model.
The approach improves out-of-distribution generalization.
Accuracy on in-distribution data is retained.
Abstract
Causal influence measures for machine learnt classifiers shed light on the reasons behind classification, and aid in identifying influential input features and revealing their biases. However, such analyses involve evaluating the classifier using datapoints that may be atypical of its training distribution. Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints. As a result, training to minimize empirical risk does not distinguish among classifiers that agree on predictions in the training distribution but have wildly different causal influences. We term this problem covariate shift in causal testing and formally characterize conditions under which it arises. As a solution to this problem, we propose a novel active learning algorithm that constrains the influence measures of the trained model. We prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Bayesian Modeling and Causal Inference
