Decoding Layer Saliency in Language Transformers

Elizabeth M. Hou; Gregory Castanon

arXiv:2308.05219·cs.CL·August 11, 2023

Decoding Layer Saliency in Language Transformers

Elizabeth M. Hou, Gregory Castanon

PDF

Open Access 1 Video

TL;DR

This paper presents a gradient-based method for identifying salient textual features in large language transformers, improving interpretability without extra training or data, and demonstrating superior performance on benchmark datasets.

Contribution

It introduces a novel gradient-based saliency strategy tailored for transformer models, with a new evaluation metric for semantic coherence, enhancing interpretability in NLP tasks.

Findings

01

Consistent improvement over existing saliency methods.

02

No additional training or labeled data needed.

03

Efficient computation suitable for large models.

Abstract

In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Decoding Layer Saliency in Language Transformers· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques