Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations
Qianli Wang, Nils Feldhus, Pepa Atanasova, Fedor Splitt, Simon Ostermann, Sebastian M\"oller, Vera Schmitt

TL;DR
This paper investigates how quantization, a technique for model compression, affects the quality, faithfulness, and trustworthiness of self-explanations generated by large language models, revealing moderate declines but overall robustness.
Contribution
It provides the first comprehensive analysis of quantization's impact on LLM self-explanations, including natural language explanations and counterfactuals, across different techniques and model sizes.
Findings
Quantization causes up to 4.4% decline in SE quality.
Faithfulness of SEs drops by up to 2.38% due to quantization.
User trust and coherence in SEs decrease by up to 8.5%.
Abstract
Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require reasoning about the model's own decision-making process, a capability that may exhibit particular sensitivity to quantization. As SEs are increasingly relied upon for transparency in high-stakes applications, understanding whether and to what extent quantization degrades SE quality and faithfulness is critical. To address this gap, we examine two types of SEs: natural language explanations (NLEs) and counterfactual examples, generated by LLMs quantized using three common techniques at distinct bit widths. Our findings indicate that quantization typically leads to moderate declines in both SE quality (up to 4.4\%) and faithfulness (up to 2.38\%). The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
