
TL;DR
Autolearn is a novel framework enabling language models to learn from documents without external supervision by using a self-verification process and a new metric to distinguish memorization from understanding.
Contribution
It introduces a Q&A-format training method and the perturbation gap metric, effectively reducing memorization and enhancing genuine knowledge acquisition in language models.
Findings
Autolearn reduces the perturbation gap below baseline levels, indicating less memorization.
Passage-specific knowledge acquisition increases the probability of generating correct facts from 6% to 54%.
Q&A format outperforms standard fine-tuning on genuinely novel facts.
Abstract
We propose Autolearn, a framework that enables language models to learn from documents they read, with no external supervision. Passages that produce anomalously high per-token loss are flagged, verified through a self-generated Q&A chain, and trained on with conviction-proportional adjustment. We introduce the perturbation gap (paraphrase-to-original perplexity ratio) as a metric that distinguishes memorization from understanding. The key mechanism is the training data format: Q&A-format training drives the perturbation gap below the pre-trained baseline (2.098 vs. 2.204, , ), suppressing token-sequence memorization, while standard fine-tuning's best attempt remains within noise (, ). Across four models spanning Qwen3 and Phi-4 families, Autolearn is the only method that enters this regime. Stochastic evaluation reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
