Latent Knowledge as a Predictor of Fact Acquisition in Fine-Tuned Large Language Models
Daniel B. Hier, Tayo Obafemi-Ajayi

TL;DR
This study investigates how latent knowledge in large language models influences the speed of fact learning and generalization, revealing that latent knowledge predicts rapid acquisition and affects the retention of unseen facts.
Contribution
It introduces a novel analysis of latent knowledge as a predictor of fact acquisition and generalization in fine-tuned large language models using survival analysis methods.
Findings
Latent knowledge strongly predicts faster fact learning.
Fine-tuning significantly improves deterministic recall from 2.8% to 71.9%.
Limited generalization occurs for unseen facts, but is more likely with latent knowledge.
Abstract
Large language models store biomedical facts with uneven strength after pretraining: some facts are present in the weights but are not reliably accessible under deterministic decoding (latent knowledge), while others are scarcely represented. We fine tuned Llama 3.1 8B Instruct to learn ontology term identifier mappings from the Human Phenotype Ontology (800 pairs) and the Gene Ontology (400 training pairs), withholding 400 GO pairs to test generalization. Treating learning as a time to event process across 20 epochs, we used stochastic decoding to detect latent knowledge at baseline and Cox proportional hazards models to identify predictors of acquisition, generalization, and degradation. Baseline deterministic recall for HPO was 2.8%, rising to 71.9% after fine-tuning. Latent knowledge was the strongest predictor of faster fact acquisition (HR 2.6) and was associated with earlier,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Genomics and Rare Diseases · Topic Modeling
