The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning
Xi Ye, Greg Durrett

TL;DR
This paper investigates whether explanations improve large language model reasoning, finding limited benefits from explanations, highlighting their unreliability, but also proposing methods to assess and improve explanation quality post-hoc.
Contribution
The study systematically evaluates the impact of explanations in prompting LLMs for reasoning tasks and introduces a calibration method to assess explanation reliability.
Findings
Explanations yield small to moderate accuracy improvements.
Generated explanations may not be factually grounded or entail predictions.
Calibrators can improve performance by assessing explanation reliability.
Abstract
Does prompting a large language model (LLM) like GPT-3 with explanations improve in-context learning? We study this question on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We test the performance of four LLMs on three textual reasoning datasets using prompts that include explanations in multiple different styles. For these tasks, we find that including explanations in the prompts for OPT, GPT-3 (davinci), and InstructGPT (text-davinci-001) only yields small to moderate accuracy improvements over standard few-show learning. However, text-davinci-002 is able to benefit more substantially. We further show that explanations generated by the LLMs may not entail the models' predictions nor be factually grounded in the input, even on simple tasks with extractive explanations. However, these flawed explanations can still be useful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · OPT · Linear Layer · Cosine Annealing · Byte Pair Encoding · Linear Warmup With Cosine Annealing · Residual Connection · Attention Dropout · {Dispute@FaQ-s}How to file a dispute with Expedia?
