Deciphering the Factors Influencing the Efficacy of Chain-of-Thought:   Probability, Memorization, and Noisy Reasoning

Akshara Prabhakar; Thomas L. Griffiths; R. Thomas McCoy

arXiv:2407.01687·cs.CL·October 7, 2024·2 cites

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

Akshara Prabhakar, Thomas L. Griffiths, R. Thomas McCoy

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how factors like output probability, memorization, and noisy reasoning influence the effectiveness of Chain-of-Thought prompting in large language models, revealing that performance depends on both learned patterns and probabilistic reasoning.

Contribution

It provides a detailed case study analyzing the impact of probability, memorization, and noise on CoT performance across multiple LLMs using a symbolic decoding task.

Findings

01

Performance varies with output probability, e.g., 26% to 70% accuracy in GPT-4.

02

CoT performance is affected by memorization and probabilistic reasoning.

03

Factors like task complexity and learned patterns significantly influence reasoning accuracy.

Abstract

Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs). However, debates persist about whether LLMs exhibit abstract generalization or rely on shallow heuristics when given CoT prompts. To understand the factors influencing CoT reasoning we provide a detailed case study of the symbolic reasoning task of decoding shift ciphers, where letters are shifted forward some number of steps in the alphabet. We analyze the pattern of results produced by three LLMs -- GPT-4, Claude 3, and Llama 3.1 -- performing this task using CoT prompting. By focusing on a single relatively simple task, we are able to identify three factors that systematically affect CoT performance: the probability of the task's expected output (probability), what the model has implicitly learned during pre-training (memorization), and the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aksh555/deciphering_cot
pytorchOfficial

Videos

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning· underline

Taxonomy

TopicsMental Health Research Topics · Opinion Dynamics and Social Influence · Advanced Text Analysis Techniques

MethodsAttention Is All You Need · LLaMA · Linear Layer · Multi-Head Attention · Softmax · Byte Pair Encoding · Layer Normalization · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer