Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations

Qianli Wang; Nils Feldhus; Pepa Atanasova; Fedor Splitt; Simon Ostermann; Sebastian M\"oller; Vera Schmitt

arXiv:2601.00282·cs.CL·January 5, 2026

Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations

Qianli Wang, Nils Feldhus, Pepa Atanasova, Fedor Splitt, Simon Ostermann, Sebastian M\"oller, Vera Schmitt

PDF

Open Access

TL;DR

This paper investigates how quantization, a technique for model compression, affects the quality, faithfulness, and trustworthiness of self-explanations generated by large language models, revealing moderate declines but overall robustness.

Contribution

It provides the first comprehensive analysis of quantization's impact on LLM self-explanations, including natural language explanations and counterfactuals, across different techniques and model sizes.

Findings

01

Quantization causes up to 4.4% decline in SE quality.

02

Faithfulness of SEs drops by up to 2.38% due to quantization.

03

User trust and coherence in SEs decrease by up to 8.5%.

Abstract

Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require reasoning about the model's own decision-making process, a capability that may exhibit particular sensitivity to quantization. As SEs are increasingly relied upon for transparency in high-stakes applications, understanding whether and to what extent quantization degrades SE quality and faithfulness is critical. To address this gap, we examine two types of SEs: natural language explanations (NLEs) and counterfactual examples, generated by LLMs quantized using three common techniques at distinct bit widths. Our findings indicate that quantization typically leads to moderate declines in both SE quality (up to 4.4\%) and faithfulness (up to 2.38\%). The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI