Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou,, Leilani H. Gilpin

TL;DR
This paper evaluates the quality of self-explanations generated by ChatGPT for sentiment analysis, comparing them to traditional methods, and discusses implications for interpretability practices in LLMs.
Contribution
It systematically assesses ChatGPT's self-explanations' faithfulness and characteristics, highlighting their potential and limitations compared to traditional explanation methods.
Findings
Self-explanations perform comparably to traditional methods in faithfulness.
They are cheaper to produce as they are generated with the prediction.
Self-explanations differ significantly from traditional explanations according to agreement metrics.
Abstract
Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment analysis, mathematical reasoning and summarization. Furthermore, since these models are instruction-tuned on human conversations to produce "helpful" responses, they can and often will produce explanations along with the response, which we call self-explanations. For example, when analyzing the sentiment of a movie review, the model may output not only the positivity of the sentiment, but also an explanation (e.g., by listing the sentiment-laden words such as "fantastic" and "memorable" in the review). How good are these automatically generated self-explanations? In this paper, we investigate this question on the task of sentiment analysis and for feature attribution explanation, one of the most commonly studied settings in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science
MethodsSparse Evolutionary Training · Local Interpretable Model-Agnostic Explanations
