Comparing Knowledge Injection Methods for LLMs in a Low-Resource Regime
Hugo Abonizio, Thales Almeida, Roberto Lotufo, Rodrigo Nogueira

TL;DR
This paper investigates methods for injecting small amounts of new knowledge into large language models, comparing augmentation techniques, and analyzing the balance between learning new facts and forgetting, with practical insights and code resources.
Contribution
It introduces diverse augmentation algorithms for knowledge injection in LLMs, analyzes the effects of variability and forgetting, and demonstrates self-generated synthetic data as a promising approach.
Findings
Diverse prompting improves knowledge acquisition more than simple continued pre-training.
Exposing models to varied textual prompts enhances learning of new facts.
RAG-based methods can cause greater degradation on control datasets.
Abstract
Large language models (LLMs) often require vast amounts of text to effectively acquire new knowledge. While continuing pre-training on large corpora or employing retrieval-augmented generation (RAG) has proven successful, updating an LLM with only a few thousand or million tokens remains challenging. In this work, we investigate the task of injecting small, unstructured information into LLMs and its relation to the catastrophic forgetting phenomenon. We use a dataset of recent news -- ensuring no overlap with the model's pre-training data -- to evaluate the knowledge acquisition by probing the model with question-answer pairs related the learned information. Starting from a continued pre-training baseline, we explored different augmentation algorithms to generate synthetic data to improve the knowledge acquisition capabilities. Our experiments show that simply continuing pre-training on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Mineral Processing and Grinding · Advanced Data Storage Technologies
