How Does Unfaithful Reasoning Emerge from Autoregressive Training? A Study of Synthetic Experiments
Fuxin Wang, Amr Alazali, Yiqiao Zhong

TL;DR
This study investigates how unfaithful reasoning emerges in large language models trained autoregressively, revealing a threshold of training noise that influences whether models learn faithful or unfaithful reasoning patterns, with implications for understanding and improving model reasoning.
Contribution
The paper introduces synthetic experiments to analyze the emergence of unfaithful reasoning in autoregressive models, highlighting the role of training noise and internal uncertainty encoding.
Findings
Models learn faithful reasoning below a critical noise threshold.
High noise levels cause a transition to unfaithful skip-step reasoning.
Models encode internal uncertainty, enabling implicit self-verification.
Abstract
Chain-of-thought (CoT) reasoning generated by large language models (LLMs) is often unfaithful: intermediate steps can be logically inconsistent or fail to reflect the causal relationship leading to the final answer. Despite extensive empirical observations, a fundamental understanding of CoT is lacking--what constitutes faithful CoT reasoning, and how unfaithfulness emerges from autoregressive training. We study these questions using well-controlled synthetic experiments, training small transformers on noisy data to solve modular arithmetic expressions step by step, a task we term Arithmetic Expression Reasoning. We find that models can learn faithful reasoning that causally follows the underlying arithmetic rules, but only when the training noise is below a critical threshold, a phenomenon attributable to simplicity bias. At higher noise levels, training dynamics exhibit a transition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Embodied and Extended Cognition · Child and Animal Learning Development
