An Attribution Method for Siamese Encoders

Lucas M\"oller; Dmitry Nikolaev; Sebastian Pad\'o

arXiv:2310.05703·cs.CL·November 30, 2023

An Attribution Method for Siamese Encoders

Lucas M\"oller, Dmitry Nikolaev, Sebastian Pad\'o

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel attribution method for Siamese encoders that explains model predictions by attributing importance to pairs of input features, enabling interpretability of models like sentence transformers.

Contribution

It generalizes integrated gradients to models with multiple inputs, providing a formal, convergent attribution method for Siamese encoders that highlights influential feature pairs.

Findings

01

Few token-pairs often explain large prediction fractions.

02

The method highlights nouns and verbs as important features.

03

Accurate predictions require attending to most input tokens.

Abstract

Despite the success of Siamese encoder models such as sentence transformers (ST), little is known about the aspects of inputs they pay attention to. A barrier is that their predictions cannot be attributed to individual features, as they compare two inputs rather than processing a single one. This paper derives a local attribution method for Siamese encoders by generalizing the principle of integrated gradients to models with multiple inputs. The solution takes the form of feature-pair attributions, and can be reduced to a token-token matrix for STs. Our method involves the introduction of integrated Jacobians and inherits the advantageous formal properties of integrated gradients: it accounts for the model's full computation graph and is guaranteed to converge to the actual prediction. A pilot study shows that in an ST few token-pairs can often explain large fractions of predictions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucasmllr/xsbert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Ferroelectric and Negative Capacitance Devices