Can Large Language Models Explain Themselves? A Study of LLM-Generated   Self-Explanations

Shiyuan Huang; Siddarth Mamidanna; Shreedhar Jangam; Yilun Zhou,; Leilani H. Gilpin

arXiv:2310.11207·cs.CL·October 18, 2023·20 cites

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou,, Leilani H. Gilpin

PDF

Open Access

TL;DR

This paper evaluates the quality of self-explanations generated by ChatGPT for sentiment analysis, comparing them to traditional methods, and discusses implications for interpretability practices in LLMs.

Contribution

It systematically assesses ChatGPT's self-explanations' faithfulness and characteristics, highlighting their potential and limitations compared to traditional explanation methods.

Findings

01

Self-explanations perform comparably to traditional methods in faithfulness.

02

They are cheaper to produce as they are generated with the prediction.

03

Self-explanations differ significantly from traditional explanations according to agreement metrics.

Abstract

Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment analysis, mathematical reasoning and summarization. Furthermore, since these models are instruction-tuned on human conversations to produce "helpful" responses, they can and often will produce explanations along with the response, which we call self-explanations. For example, when analyzing the sentiment of a movie review, the model may output not only the positivity of the sentiment, but also an explanation (e.g., by listing the sentiment-laden words such as "fantastic" and "memorable" in the review). How good are these automatically generated self-explanations? In this paper, we investigate this question on the task of sentiment analysis and for feature attribution explanation, one of the most commonly studied settings in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science

MethodsSparse Evolutionary Training · Local Interpretable Model-Agnostic Explanations