Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning
Nick Mecklenburg, Yiyou Lin, Xiaoxiao Li, Daniel Holstein, Leonardo, Nunes, Sara Malvar, Bruno Silva, Ranveer Chandra, Vijay Aski, Pavan Kumar, Reddy Yannam, Tolga Aktas, Todd Hendry

TL;DR
This paper explores supervised fine-tuning methods to effectively inject new, out-of-domain knowledge into large language models, focusing on recent sporting events, and compares dataset generation strategies for improved factual accuracy.
Contribution
It introduces a novel dataset generation process and systematically compares token-based and fact-based scaling for knowledge injection in LLMs.
Findings
Fact-based scaling ensures more uniform knowledge coverage.
Token-based scaling improves Q&A accuracy but lacks coverage consistency.
The proposed method enhances LLM factuality in specific domains.
Abstract
In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Supervised Fine-Tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on the domain of recent sporting events. We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing
