Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations

Shahin Atakishiyev; Housam K.B. Babiker; Jiayi Dai; Nawshad Farruque; Teruaki Hayashi; Nafisa Sadaf Hriti; Md Abed Rahman; Iain Smith; Mi-Young Kim; Osmar R. Za\"iane; Randy Goebel

arXiv:2510.17256·cs.CL·October 21, 2025

Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations

Shahin Atakishiyev, Housam K.B. Babiker, Jiayi Dai, Nawshad Farruque, Teruaki Hayashi, Nafisa Sadaf Hriti, Md Abed Rahman, Iain Smith, Mi-Young Kim, Osmar R. Za\"iane, Randy Goebel

PDF

Open Access

TL;DR

This paper reviews methods for understanding and interpreting large language models, especially in critical domains like healthcare and autonomous driving, highlighting challenges and future opportunities for trustworthy explanations.

Contribution

It provides a comprehensive review of explainability approaches, experimental insights in key domains, and outlines future challenges for trustworthy LLM explanations.

Findings

01

Explainability approaches enhance trust in LLMs.

02

Experimental studies reveal domain-specific interpretability challenges.

03

Identifies key open issues and future research directions.

Abstract

Large language models have exhibited impressive performance across a broad range of downstream tasks in natural language processing. However, how a language model predicts the next token and generates content is not generally understandable by humans. Furthermore, these models often make errors in prediction and reasoning, known as hallucinations. These errors underscore the urgent need to better understand and interpret the intricate inner workings of language models and how they generate predictive outputs. Motivated by this gap, this paper investigates local explainability and mechanistic interpretability within Transformer-based large language models to foster trust in such models. In this regard, our paper aims to make three key contributions. First, we present a review of local explainability and mechanistic interpretability approaches and insights from relevant studies in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications