Premise Order Matters in Reasoning with Large Language Models
Xinyun Chen, Ryan A. Chi, Xuezhi Wang, Denny Zhou

TL;DR
This paper reveals that large language models' reasoning accuracy significantly depends on the order of premises, with correct ordering greatly improving performance in deductive reasoning and mathematical problem-solving tasks.
Contribution
The study systematically demonstrates the impact of premise order on LLM reasoning performance and introduces the R-GSM benchmark for further evaluation.
Findings
Premise order affects LLM reasoning accuracy by over 30%.
Aligning premise order with reasoning steps improves performance.
Introducing R-GSM benchmark for ordering effects in math problems.
Abstract
Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to the ordering of the premises, despite the fact that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps. For example, in deductive reasoning tasks, presenting the premises in the same order as the ground truth proof in the prompt (as opposed to random ordering) drastically increases the model's accuracy. We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
