Mitigating Large Language Model Hallucination with Faithful Finetuning
Minda Hu, Bowei He, Yufei Wang, Liangyou Li, Chen Ma, Irwin King

TL;DR
This paper introduces Faithful Finetuning (F2), a new method that explicitly reduces hallucinations in large language models by using specialized loss functions during fine-tuning, leading to more truthful responses.
Contribution
The paper proposes Faithful Finetuning (F2), a novel fine-tuning approach that explicitly models faithfulness to mitigate hallucinations in large language models.
Findings
F2 significantly reduces hallucinations compared to baseline models.
F2 improves factual accuracy on multiple datasets.
F2 maintains fluency while enhancing truthfulness.
Abstract
Large language models (LLMs) have demonstrated remarkable performance on various natural language processing tasks. However, they are prone to generating fluent yet untruthful responses, known as "hallucinations". Hallucinations can lead to the spread of misinformation and cause harm in critical applications. Mitigating hallucinations is challenging as they arise from factors such as noisy data, model overconfidence, lack of knowledge, and the generation process itself. Recent efforts have attempted to address this issue through representation editing and decoding algorithms, reducing hallucinations without major structural changes or retraining. However, these approaches either implicitly edit LLMs' behavior in latent space or suppress the tendency to output unfaithful results during decoding instead of explicitly modeling on hallucination. In this work, we introduce Faithful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
