Provable Knowledge Acquisition and Extraction in One-Layer Transformers
Ruichen Xu, Kexin Chen

TL;DR
This paper provides a theoretical analysis of how one-layer transformers acquire and extract factual knowledge, revealing the conditions under which facts are stored and retrieved, and explaining hallucination phenomena.
Contribution
It introduces a formal framework for understanding knowledge acquisition and extraction in simplified transformer models, linking pre-training, fine-tuning, and factual recall mechanisms.
Findings
Pre-training learns structured attention and relation-specific features.
Fine-tuning can trigger fact extraction without revisiting all subject-answer pairs.
Knowledge extraction depends on relation coverage and pre-training multiplicity.
Abstract
Large language models may encounter factual knowledge during pre-training yet fail to reliably use that knowledge after fine-tuning. Despite growing empirical evidence that MLP layers store factual associations and fine-tuning affects factual recall, the training-dynamics mechanisms linking next-token pre-training, knowledge storage, and post-fine-tuning extraction remain poorly understood. We study this problem in a stylized one-layer transformer with self-attention and MLP modules, trained by next-token prediction and subsequently fine-tuned on question-answering data. Under suitable regularity conditions, we first prove that the model reaches near-optimal pre-training loss while learning structured attention patterns and relation-specific feature directions, giving a mechanism for factual knowledge acquisition. We then show that fine-tuning can turn the Q&A prompt format into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
