Data Doping or True Intelligence? Evaluating the Transferability of Injected Knowledge in LLMs

Essa Jan; Moiz Ali; Muhammad Saram Hassan; Fareed Zaffar; Yasir Zaki

arXiv:2505.17140·cs.CL·May 26, 2025

Data Doping or True Intelligence? Evaluating the Transferability of Injected Knowledge in LLMs

Essa Jan, Moiz Ali, Muhammad Saram Hassan, Fareed Zaffar, Yasir Zaki

PDF

1 Video

TL;DR

This paper investigates how different fine-tuning tasks affect the retention and transferability of injected knowledge in large language models, highlighting the importance of task type and model size.

Contribution

It reveals that comprehension-focused tasks lead to better knowledge retention and transfer in LLMs compared to mapping tasks, across various architectures and scales.

Findings

01

Higher retention rates for comprehension tasks (48%) versus mapping tasks (17-20%)

02

Larger models show improved knowledge retention across all task types

03

Injected knowledge transfer diminishes in broader contexts, indicating limited semantic integration

Abstract

As the knowledge of large language models (LLMs) becomes outdated over time, there is a growing need for efficient methods to update them, especially when injecting proprietary information. Our study reveals that comprehension-intensive fine-tuning tasks (e.g., question answering and blanks) achieve substantially higher knowledge retention rates (48%) compared to mapping-oriented tasks like translation (17%) or text-to-JSON conversion (20%), despite exposure to identical factual content. We demonstrate that this pattern persists across model architectures and follows scaling laws, with larger models showing improved retention across all task types. However, all models exhibit significant performance drops when applying injected knowledge in broader contexts, suggesting limited semantic integration. These findings show the importance of task selection in updating LLM knowledge, showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Data Doping or True Intelligence? Evaluating the Transferability of Injected Knowledge in LLMs· underline