How Does Unfaithful Reasoning Emerge from Autoregressive Training? A Study of Synthetic Experiments

Fuxin Wang; Amr Alazali; Yiqiao Zhong

arXiv:2602.01017·cs.LG·February 3, 2026

How Does Unfaithful Reasoning Emerge from Autoregressive Training? A Study of Synthetic Experiments

Fuxin Wang, Amr Alazali, Yiqiao Zhong

PDF

Open Access

TL;DR

This study investigates how unfaithful reasoning emerges in large language models trained autoregressively, revealing a threshold of training noise that influences whether models learn faithful or unfaithful reasoning patterns, with implications for understanding and improving model reasoning.

Contribution

The paper introduces synthetic experiments to analyze the emergence of unfaithful reasoning in autoregressive models, highlighting the role of training noise and internal uncertainty encoding.

Findings

01

Models learn faithful reasoning below a critical noise threshold.

02

High noise levels cause a transition to unfaithful skip-step reasoning.

03

Models encode internal uncertainty, enabling implicit self-verification.

Abstract

Chain-of-thought (CoT) reasoning generated by large language models (LLMs) is often unfaithful: intermediate steps can be logically inconsistent or fail to reflect the causal relationship leading to the final answer. Despite extensive empirical observations, a fundamental understanding of CoT is lacking--what constitutes faithful CoT reasoning, and how unfaithfulness emerges from autoregressive training. We study these questions using well-controlled synthetic experiments, training small transformers on noisy data to solve modular arithmetic expressions step by step, a task we term Arithmetic Expression Reasoning. We find that models can learn faithful reasoning that causally follows the underlying arithmetic rules, but only when the training noise is below a critical threshold, a phenomenon attributable to simplicity bias. At higher noise levels, training dynamics exhibit a transition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Embodied and Extended Cognition · Child and Animal Learning Development