Towards a Theoretical Understanding of the 'Reversal Curse' via Training   Dynamics

Hanlin Zhu; Baihe Huang; Shaolun Zhang; Michael Jordan; Jiantao Jiao,; Yuandong Tian; Stuart Russell

arXiv:2405.04669·cs.LG·October 29, 2024

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao,, Yuandong Tian, Stuart Russell

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper provides a theoretical analysis of the 'reversal curse' in large language models, explaining how training dynamics lead to asymmetrical weights that cause logical reasoning failures.

Contribution

It introduces a theoretical framework analyzing the reversal curse via training dynamics of simplified models, offering new insights beyond previous expressivity-focused explanations.

Findings

01

Reversal curse results from asymmetrical weight updates during training.

02

Training dynamics cause models to fail in logical inverse reasoning tasks.

03

Theory validated on multi-layer transformer experiments.

Abstract

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ' $A \to B$ ' (e.g., 'Tom is the parent of John'), LLM fails to directly conclude ' $B \leftarrow A$ ' (e.g., 'John is the child of Tom') during inference even if the two sentences are semantically identical, which is known as the 'reversal curse'. In this paper, we theoretically analyze the reversal curse via the training dynamics of (stochastic) gradient descent for two auto-regressive models: (1) a bilinear model that can be viewed as a simplification of a one-layer transformer; (2) one-layer transformers under certain assumptions. Our analysis reveals that for both models, the reversal curse is a consequence of the (effective) model weights 'asymmetry', i.e., the increase of weights from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marlo-z/reversal_curse_analysis
pytorchOfficial

Videos

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification