Towards Transparent AI: A Survey on Explainable Large Language Models
Avash Palikhe, Zhenyu Yu, Zichong Wang, Wenbin Zhang

TL;DR
This survey reviews explainability techniques for large language models, categorizing methods based on transformer architectures, evaluating their effectiveness, and discussing future challenges to enhance transparency and responsible AI deployment.
Contribution
It provides a systematic categorization and evaluation of XAI methods for LLMs, addressing a gap in understanding explainability techniques across different transformer architectures.
Findings
Categorizes XAI methods into encoder-only, decoder-only, and encoder-decoder models.
Analyzes evaluation metrics for explainability effectiveness.
Discusses practical applications and future research challenges.
Abstract
Large Language Models (LLMs) have played a pivotal role in advancing Artificial Intelligence (AI). However, despite their achievements, LLMs often struggle to explain their decision-making processes, making them a 'black box' and presenting a substantial challenge to explainability. This lack of transparency poses a significant obstacle to the adoption of LLMs in high-stakes domain applications, where interpretability is particularly essential. To overcome these limitations, researchers have developed various explainable artificial intelligence (XAI) methods that provide human-interpretable explanations for LLMs. However, a systematic understanding of these methods remains limited. To address this gap, this survey provides a comprehensive review of explainability techniques by categorizing XAI methods based on the underlying transformer architectures of LLMs: encoder-only, decoder-only,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
