Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination
Cem Uluoglakci, Tugba Taskaya Temizel

TL;DR
This paper presents a targeted supervised fine-tuning dataset and benchmark to teach large language models epistemological humility, significantly reducing hallucinations while maintaining overall performance.
Contribution
Introduction of HypoTermInstruct dataset and HypoTermQA-Enhanced benchmark to improve models' recognition of their knowledge limits through specialized fine-tuning.
Findings
Significant improvement in hallucination reduction metrics.
Maintained performance on general knowledge tasks.
Targeted data effectively teaches meta-cognitive skills.
Abstract
Large language models (LLMs) often hallucinate, producing fluent but false information, partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce , an SFT dataset (31,487 responses for 11,151 questions) designed to teach models epistemological humility-the ability to recognize the limits of their own knowledge and admit uncertainty. This is achieved through questions about non-existent "hypothetical" terms. We also release , a benchmark for hallucination tendency strengthened through multiple validations. We conducted 800 controlled LoRA SFT runs across and (base and instruct), testing 100 fine-tuning configurations with paired controls. Our results demonstrate that replacing generic instruction data with significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Artificial Intelligence in Healthcare and Education · Topic Modeling
