Explaining Text Similarity in Transformer Models
Alexandros Vasileiou, Oliver Eberle

TL;DR
This paper explores how layer-wise relevance propagation (LRP) and BiLRP can explain the inner workings of Transformer-based similarity models in NLP, providing insights into feature interactions and semantic understanding.
Contribution
It introduces the use of BiLRP for second-order explanations in Transformer similarity models, enabling detailed analysis of feature interactions in NLP tasks.
Findings
BiLRP effectively reveals feature interactions driving similarity.
Explainability methods improve understanding of multilingual semantics.
Insights assist in biomedical text retrieval analysis.
Abstract
As Transformers have become state-of-the-art models for natural language processing (NLP) tasks, the need to understand and explain their predictions is increasingly apparent. Especially in unsupervised applications, such as information retrieval tasks, similarity models built on top of foundation model representations have been widely applied. However, their inner prediction mechanisms have mostly remained opaque. Recent advances in explainable AI have made it possible to mitigate these limitations by leveraging improved explanations for Transformers through layer-wise relevance propagation (LRP). Using BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, we investigate which feature interactions drive similarity in NLP models. We validate the resulting explanations and demonstrate their utility in three corpus-level use cases, analyzing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Digital Humanities and Scholarship
