From XAI to Stories: A Factorial Study of LLM-Generated Explanation Quality

Fabian Lukassen; Jan Herrmann; Christoph Weisser; Benjamin Saefken; Thomas Kneib

arXiv:2601.02224·cs.CL·March 16, 2026

From XAI to Stories: A Factorial Study of LLM-Generated Explanation Quality

Fabian Lukassen, Jan Herrmann, Christoph Weisser, Benjamin Saefken, Thomas Kneib

PDF

Open Access

TL;DR

This study systematically investigates how different factors like model choice, XAI methods, LLM selection, and prompting strategies influence the quality of natural language explanations generated for time-series forecasting, revealing key insights into their effectiveness.

Contribution

It provides a comprehensive factorial analysis of explanation quality factors in LLM-generated explanations, highlighting the dominance of LLM choice and revealing the interpretability paradox in model performance.

Findings

01

XAI offers limited benefits over no-XAI for non-expert users.

02

LLM choice significantly impacts explanation quality, with DeepSeek-R1 outperforming others.

03

SARIMAX models produce lower explanation quality despite higher accuracy.

Abstract

Explainable AI (XAI) methods like SHAP and LIME produce numerical feature attributions that remain inaccessible to non expert users. Prior work has shown that Large Language Models (LLMs) can transform these outputs into natural language explanations (NLEs), but it remains unclear which factors contribute to high-quality explanations. We present a systematic factorial study investigating how Forecasting model choice, XAI method, LLM selection, and prompting strategy affect NLE quality. Our design spans four models (XGBoost (XGB), Random Forest (RF), Multilayer Perceptron (MLP), and SARIMAX - comparing black-box Machine-Learning (ML) against classical time-series approaches), three XAI conditions (SHAP, LIME, and a no-XAI baseline), three LLMs (GPT-4o, Llama-3-8B, DeepSeek-R1), and eight prompting strategies. Using G-Eval, an LLM-as-a-judge evaluation method, with dual LLM judges and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications