Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning
Tianyi Wu, Jingwei Ni, Bryan Hooi, Jiaheng Zhang, Elliott Ash, See-Kiong Ng, Mrinmaya Sachan, Markus Leippold

TL;DR
This paper explores the challenge of balancing truthfulness and informativeness in instruction fine-tuning of large language models, proposing two new methods to improve truthfulness and manage uncertainty.
Contribution
It introduces two novel IFT paradigms, $UNIT_{cut}$ and $UNIT_{ref}$, to enhance truthfulness and uncertainty recognition in LLMs, addressing the trade-off issue.
Findings
$UNIT_{cut}$ improves model truthfulness.
$UNIT_{ref}$ maintains informativeness and reduces hallucinations.
Unfamiliar knowledge in datasets can harm truthfulness.
Abstract
Instruction fine-tuning (IFT) can increase the informativeness of large language models (LLMs), but may reduce their truthfulness. This trade-off arises because IFT steers LLMs to generate responses containing long-tail knowledge that was not well covered during pre-training. As a result, models become more informative but less accurate when generalizing to unseen tasks. In this paper, we empirically demonstrate how unfamiliar knowledge in IFT datasets can negatively affect the truthfulness of LLMs, and we introduce two new IFT paradigms, and , to address this issue. identifies and removes unfamiliar knowledge from IFT datasets to mitigate its impact on model truthfulness, whereas trains LLMs to recognize their uncertainty and explicitly indicate it at the end of their responses. Our experiments show that substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
