What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know"

Joosung Lee; Hwiyeol Jo; Donghyeon Ko; Kyubyung Chae; Cheonbok Park; Jeonghoon Kim

arXiv:2604.05779·cs.CL·April 8, 2026

What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know"

Joosung Lee, Hwiyeol Jo, Donghyeon Ko, Kyubyung Chae, Cheonbok Park, Jeonghoon Kim

PDF

TL;DR

This paper introduces a knowledge-weighted fine-tuning method for large language models that improves their ability to recognize when they don't know, reducing hallucinations and enhancing uncertainty estimation.

Contribution

The authors propose a novel approach to estimate instance-level knowledge scores and incorporate them into fine-tuning, enabling models to explicitly express uncertainty and improve factual accuracy.

Findings

01

Models can explicitly say 'I don't know' for out-of-scope queries.

02

The approach improves the model's ability to distinguish known from unknown instances.

03

Evaluation metrics show enhanced uncertainty discrimination and overall performance.

Abstract

While large language models (LLMs) demonstrate strong capabilities across diverse user queries, they still suffer from hallucinations, often arising from knowledge misalignment between pre-training and fine-tuning. To address this misalignment, we reliably estimate a fine-grained, instance-level knowledge score via multi-sampled inference. Using the knowledge score, we scale the learning signal according to the model's existing knowledge, while encouraging explicit "I don't know" responses for out-of-scope queries. Experimental results show that this approach allows the model to explicitly express uncertainty when it lacks knowledge, while maintaining accuracy on questions it can answer. Furthermore, we propose evaluation metrics for uncertainty, showing that accurate discrimination between known and unknown instances consistently improves performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.