PLEX: Perturbation-free Local Explanations for LLM-Based Text Classification
Yogachandran Rahulamathavan, Misbah Farooq, Varuna De Silva

TL;DR
PLEX introduces a perturbation-free method for explaining LLM-based text classification, significantly reducing computational costs while maintaining high agreement with traditional perturbation-based explanations.
Contribution
This paper presents PLEX, a novel neural network approach that provides efficient local explanations for LLMs without perturbations, outperforming existing methods in speed and comparable accuracy.
Findings
Achieves over 92% agreement with LIME and SHAP.
Reduces explanation time by two to four orders of magnitude.
Effectively identifies influential words impacting classification.
Abstract
Large Language Models (LLMs) excel in text classification, but their complexity hinders interpretability, making it difficult to understand the reasoning behind their predictions. Explainable AI (XAI) methods like LIME and SHAP offer local explanations by identifying influential words, but they rely on computationally expensive perturbations. These methods typically generate thousands of perturbed sentences and perform inferences on each, incurring a substantial computational burden, especially with LLMs. To address this, we propose \underline{P}erturbation-free \underline{L}ocal \underline{Ex}planation (PLEX), a novel method that leverages the contextual embeddings extracted from the LLM and a ``Siamese network" style neural network trained to align with feature importance scores. This one-off training eliminates the need for subsequent perturbations, enabling efficient explanations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques
