Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models
Phil Wee, Riyadh Baghdadi

TL;DR
This paper investigates how fine-tuning small language models on data from larger models increases hallucination, due to a knowledge mismatch, leading to more factual errors compared to models fine-tuned on smaller model data.
Contribution
It provides empirical evidence that fine-tuning on larger model data causes more hallucinations in small models, supporting the knowledge mismatch hypothesis.
Findings
Small models fine-tuned on larger model data produce more incorrect answers.
Knowledge mismatch during fine-tuning increases hallucination propensity.
Empirical validation on unseen test sets confirms the hypothesis.
Abstract
Recently, there has been an explosion of large language models created through fine-tuning with data from larger models. These small models able to produce outputs that appear qualitatively similar to significantly larger models. However, one of the key limitations that have been observed with these models is their propensity to hallucinate significantly more often than larger models. In particular, they have been observed to generate coherent outputs that involve factually incorrect information and spread misinformation, toxicity, and stereotypes. There are many potential causes of hallucination, of which, one hypothesis is that fine-tuning a model on data produced by a larger model leads to a knowledge mismatch which contributes to hallucination. In particular, it is hypothesized that there is a mismatch between the knowledge that is fed to the model to fine-tune it and the knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Time Series Analysis
