Stress Testing Chain-of-Thought Prompting for Large Language Models

Aayush Mishra; Karan Thakkar

arXiv:2309.16621·cs.CL·September 29, 2023

Stress Testing Chain-of-Thought Prompting for Large Language Models

Aayush Mishra, Karan Thakkar

PDF

Open Access

TL;DR

This paper investigates how different types of perturbations in Chain-of-Thought prompting affect large language models' reasoning accuracy, highlighting the importance of correct CoT values and the varying impact of different perturbations.

Contribution

It provides a detailed analysis of how CoT prompt perturbations influence LLM performance, revealing the critical role of correct CoT values and the relative robustness to certain perturbations.

Findings

01

Incorrect CoT prompting reduces accuracy

02

Correct CoT values are essential for correct answers

03

Order and operator errors have less impact than value errors

Abstract

This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs). Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks. Our findings show that incorrect CoT prompting leads to poor performance on accuracy metrics. Correct values in the CoT is crucial for predicting correct answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT order are wrong, do not affect the performance as drastically when compared to the value based perturbations. This research deepens our understanding of CoT prompting and opens some new questions regarding the capability of LLMs to learn reasoning in context.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections · Layer Normalization · Attention Dropout · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia?