Stress Testing Chain-of-Thought Prompting for Large Language Models
Aayush Mishra, Karan Thakkar

TL;DR
This paper investigates how different types of perturbations in Chain-of-Thought prompting affect large language models' reasoning accuracy, highlighting the importance of correct CoT values and the varying impact of different perturbations.
Contribution
It provides a detailed analysis of how CoT prompt perturbations influence LLM performance, revealing the critical role of correct CoT values and the relative robustness to certain perturbations.
Findings
Incorrect CoT prompting reduces accuracy
Correct CoT values are essential for correct answers
Order and operator errors have less impact than value errors
Abstract
This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs). Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks. Our findings show that incorrect CoT prompting leads to poor performance on accuracy metrics. Correct values in the CoT is crucial for predicting correct answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT order are wrong, do not affect the performance as drastically when compared to the value based perturbations. This research deepens our understanding of CoT prompting and opens some new questions regarding the capability of LLMs to learn reasoning in context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections · Layer Normalization · Attention Dropout · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia?
