Global-to-Local Support Spectrums for Language Model Explainability
Lucas Agussurja, Xinyang Lu, Bryan Kian Hsiang Low

TL;DR
This paper introduces support spectrums, a novel explanation method for language models that combines support sets and global-to-local importance measures to provide tailored, test point-specific explanations in image and text tasks.
Contribution
The paper proposes a new explanation approach using support spectrums, improving specificity and relevance over existing static, outlier-skewed methods.
Findings
Effective in image classification tasks
Provides tailored explanations for specific test points
Outperforms existing influence-based methods
Abstract
Existing sample-based methods, like influence functions and representer points, measure the importance of a training point by approximating the effect of its removal from training. As such, they are skewed towards outliers and points that are very close to the decision boundaries. The explanations provided by these methods are often static and not specific enough for different test points. In this paper, we propose a method to generate an explanation in the form of support spectrums which are based on two main ideas: the support sets and a global-to-local importance measure. The support set is the set of training points, in the predicted class, that ``lie in between'' the test point and training points in the other classes. They indicate how well the test point can be distinguished from the points not in the predicted class. The global-to-local importance measure is obtained by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
