Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao,, Yuandong Tian, Stuart Russell

TL;DR
This paper provides a theoretical analysis of the 'reversal curse' in large language models, explaining how training dynamics lead to asymmetrical weights that cause logical reasoning failures.
Contribution
It introduces a theoretical framework analyzing the reversal curse via training dynamics of simplified models, offering new insights beyond previous expressivity-focused explanations.
Findings
Reversal curse results from asymmetrical weight updates during training.
Training dynamics cause models to fail in logical inverse reasoning tasks.
Theory validated on multi-layer transformer experiments.
Abstract
Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on '' (e.g., 'Tom is the parent of John'), LLM fails to directly conclude '' (e.g., 'John is the child of Tom') during inference even if the two sentences are semantically identical, which is known as the 'reversal curse'. In this paper, we theoretically analyze the reversal curse via the training dynamics of (stochastic) gradient descent for two auto-regressive models: (1) a bilinear model that can be viewed as a simplification of a one-layer transformer; (2) one-layer transformers under certain assumptions. Our analysis reveals that for both models, the reversal curse is a consequence of the (effective) model weights 'asymmetry', i.e., the increase of weights from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification
