When is the consistent prediction likely to be a correct prediction?

Alex Nguyen; Dheeraj Mekala; Chengyu Dong; Jingbo Shang

arXiv:2407.05778·cs.CL·July 9, 2024

When is the consistent prediction likely to be a correct prediction?

Alex Nguyen, Dheeraj Mekala, Chengyu Dong, Jingbo Shang

PDF

Open Access

TL;DR

This paper shows that longer, self-generated reasoning chains in large language models improve prediction accuracy, challenging the idea that mere consistency across outputs indicates correctness.

Contribution

It reveals that longer, self-produced chain-of-thought reasoning enhances correctness, emphasizing the importance of response length in decoding strategies.

Findings

01

Longer responses lead to more accurate predictions.

02

Sampling multiple outputs improves self-consistency performance.

03

Long responses are infrequent, requiring length-conditioned decoding.

Abstract

Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all outputs, are more likely to be correct. This is predominantly because we demonstrate that LLMs can autonomously produce chain-of-thought (CoT) style reasoning with no custom prompts merely while generating longer responses, which lead to consistent predictions that are more accurate. In the zero-shot setting, by sampling Mixtral-8x7B model multiple times and considering longer responses, we achieve 86% of its self-consistency performance obtained through zero-shot CoT prompting on the GSM8K and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsChain-of-thought prompting