Evaluating the Effectiveness of XAI Techniques for Encoder-Based Language Models
Melkamu Abay Mersha, Mesay Gemeda Yigezu, Jugal Kalita

TL;DR
This paper introduces a comprehensive evaluation framework for XAI techniques applied to encoder-based language models, comparing six methods across multiple metrics and models to identify the most effective explainability approaches.
Contribution
It provides a systematic evaluation of six XAI techniques using four key metrics across five language models, highlighting the strengths and weaknesses of each method.
Findings
LIME outperforms other XAI methods across multiple metrics.
AMV shows superior robustness and consistency.
LRP excels in contrastivity, especially on complex models.
Abstract
The black-box nature of large language models (LLMs) necessitates the development of eXplainable AI (XAI) techniques for transparency and trustworthiness. However, evaluating these techniques remains a challenge. This study presents a general evaluation framework using four key metrics: Human-reasoning Agreement (HA), Robustness, Consistency, and Contrastivity. We assess the effectiveness of six explainability techniques from five different XAI categories model simplification (LIME), perturbation-based methods (SHAP), gradient-based approaches (InputXGradient, Grad-CAM), Layer-wise Relevance Propagation (LRP), and attention mechanisms-based explainability methods (Attention Mechanism Visualization, AMV) across five encoder-based language models: TinyBERT, BERTbase, BERTlarge, XLM-R large, and DeBERTa-xlarge, using the IMDB Movie Reviews and Tweet Sentiment Extraction (TSE) datasets. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · XLM-R
