Deduction under Perturbed Evidence: Probing Student Simulation   Capabilities of Large Language Models

Shashank Sonkar; Richard G. Baraniuk

arXiv:2305.14507·cs.CL·May 25, 2023·1 cites

Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models

Shashank Sonkar, Richard G. Baraniuk

PDF

Open Access 1 Datasets

TL;DR

This paper investigates whether large language models can perform logical reasoning over manipulated or falsified evidence, revealing their limitations in deducing correct conclusions when faced with distorted facts.

Contribution

The study introduces the DUPE framework and a modified dataset to evaluate LLMs' reasoning with perturbed evidence, highlighting their struggles and potential mitigation strategies.

Findings

01

GPT models' accuracy drops by 45% on manipulated data

02

Prompt strategies inspired by student simulation improve reasoning accuracy

03

LLMs show limited ability to reason over falsified evidence

Abstract

We explore whether Large Language Models (LLMs) are capable of logical reasoning with distorted facts, which we call Deduction under Perturbed Evidence (DUPE). DUPE presents a unique challenge to LLMs since they typically rely on their parameters, which encode mostly accurate information, to reason and make inferences. However, in DUPE, LLMs must reason over manipulated or falsified evidence present in their prompts, which can result in false conclusions that are valid only under the manipulated evidence. Our goal with DUPE is to determine whether LLMs can arrive at these false conclusions and identify whether the dominant factor influencing the deduction process is the encoded data in the parameters or the manipulated evidence in the prompts. To evaluate the DUPE capabilities of LLMs, we create a DUPEd version of the StrategyQA dataset, where facts are manipulated to reverse the answer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

luffycodes/DUPEd_StrategyQA
dataset· 56 dl
56 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsAttention Is All You Need · Cosine Annealing · Dense Connections · Weight Decay · Residual Connection · Linear Warmup With Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia? · Discriminative Fine-Tuning · Softmax · Layer Normalization