Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations
Shahin Atakishiyev, Housam K.B. Babiker, Jiayi Dai, Nawshad Farruque, Teruaki Hayashi, Nafisa Sadaf Hriti, Md Abed Rahman, Iain Smith, Mi-Young Kim, Osmar R. Za\"iane, Randy Goebel

TL;DR
This paper reviews methods for understanding and interpreting large language models, especially in critical domains like healthcare and autonomous driving, highlighting challenges and future opportunities for trustworthy explanations.
Contribution
It provides a comprehensive review of explainability approaches, experimental insights in key domains, and outlines future challenges for trustworthy LLM explanations.
Findings
Explainability approaches enhance trust in LLMs.
Experimental studies reveal domain-specific interpretability challenges.
Identifies key open issues and future research directions.
Abstract
Large language models have exhibited impressive performance across a broad range of downstream tasks in natural language processing. However, how a language model predicts the next token and generates content is not generally understandable by humans. Furthermore, these models often make errors in prediction and reasoning, known as hallucinations. These errors underscore the urgent need to better understand and interpret the intricate inner workings of language models and how they generate predictive outputs. Motivated by this gap, this paper investigates local explainability and mechanistic interpretability within Transformer-based large language models to foster trust in such models. In this regard, our paper aims to make three key contributions. First, we present a review of local explainability and mechanistic interpretability approaches and insights from relevant studies in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
