Decoding Layer Saliency in Language Transformers
Elizabeth M. Hou, Gregory Castanon

TL;DR
This paper presents a gradient-based method for identifying salient textual features in large language transformers, improving interpretability without extra training or data, and demonstrating superior performance on benchmark datasets.
Contribution
It introduces a novel gradient-based saliency strategy tailored for transformer models, with a new evaluation metric for semantic coherence, enhancing interpretability in NLP tasks.
Findings
Consistent improvement over existing saliency methods.
No additional training or labeled data needed.
Efficient computation suitable for large models.
Abstract
In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
