Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models
Zikai Xie

TL;DR
This paper investigates how the order of reasoning and answering affects large language models' consistency and hallucination issues, proposing a new benchmark and prompt strategy to improve their factual reliability.
Contribution
It introduces an order-based benchmark for assessing LLM consistency and a reflexive prompting method to reduce hallucinations and factual errors.
Findings
Order of reasoning impacts LLM consistency
The proposed prompt improves factual accuracy
Benchmark effectively identifies hallucination instances
Abstract
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains. However, these models often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated. A particularly troubling issue discovered and widely discussed recently is the numerical comparison error where multiple LLMs incorrectly infer that "9.119.9". We discovered that the order in which LLMs generate answers and reasoning impacts their consistency. Specifically, results vary significantly when an LLM generates an answer first and then provides the reasoning versus generating the reasoning process first and then the conclusion. Inspired by this, we propose a new benchmark method for assessing LLM consistency: comparing responses generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychiatry, Mental Health, Neuroscience · Mental Health and Psychiatry
MethodsSoftmax · Attention Is All You Need
