The Unreliability of Explanations in Few-shot Prompting for Textual   Reasoning

Xi Ye; Greg Durrett

arXiv:2205.03401·cs.CL·October 14, 2022·52 cites

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning

Xi Ye, Greg Durrett

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates whether explanations improve large language model reasoning, finding limited benefits from explanations, highlighting their unreliability, but also proposing methods to assess and improve explanation quality post-hoc.

Contribution

The study systematically evaluates the impact of explanations in prompting LLMs for reasoning tasks and introduces a calibration method to assess explanation reliability.

Findings

01

Explanations yield small to moderate accuracy improvements.

02

Generated explanations may not be factually grounded or entail predictions.

03

Calibrators can improve performance by assessing explanation reliability.

Abstract

Does prompting a large language model (LLM) like GPT-3 with explanations improve in-context learning? We study this question on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We test the performance of four LLMs on three textual reasoning datasets using prompts that include explanations in multiple different styles. For these tasks, we find that including explanations in the prompts for OPT, GPT-3 (davinci), and InstructGPT (text-davinci-001) only yields small to moderate accuracy improvements over standard few-show learning. However, text-davinci-002 is able to benefit more substantially. We further show that explanations generated by the LLMs may not entail the models' predictions nor be factually grounded in the input, even on simple tasks with extractive explanations. However, these flawed explanations can still be useful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiye17/textualexplincontext
noneOfficial

Videos

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · OPT · Linear Layer · Cosine Annealing · Byte Pair Encoding · Linear Warmup With Cosine Annealing · Residual Connection · Attention Dropout · {Dispute@FaQ-s}How to file a dispute with Expedia?