The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike, Rabbat, Mark Ibrahim

TL;DR
This paper identifies the factorization curse as a core reason behind hallucinations and retrieval failures in language models, showing that current objectives inherently struggle with different token orderings, and proposes factorization-agnostic methods as a solution.
Contribution
The paper introduces the concept of the factorization curse, demonstrates its impact through controlled experiments, and suggests that factorization-agnostic objectives can mitigate this issue.
Findings
The factorization curse is an inherent failure of next-token prediction models.
Scale and naive bidirectional training do not solve the reversal curse.
Factorization-agnostic objectives improve knowledge retrieval across tasks.
Abstract
Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCongenital limb and hand anomalies · Law, AI, and Intellectual Property · Conflict of Laws and Jurisdiction
MethodsHierarchical Information Threading
