Fine-Tuning Language Models to Know What They Know

Sangjun Park; Elliot Meyerson; Xin Qiu; Risto Miikkulainen

arXiv:2602.02605·cs.NE·February 4, 2026

Fine-Tuning Language Models to Know What They Know

Sangjun Park, Elliot Meyerson, Xin Qiu, Risto Miikkulainen

PDF

Open Access

TL;DR

This paper introduces a framework and a training method to improve large language models' ability to recognize and report their own knowledge, enhancing their metacognitive capabilities.

Contribution

It proposes a dual-prompt measurement framework and a novel training method, ESMA, to align models' internal knowledge with their explicit responses.

Findings

01

ESMA improves models' metacognitive ability across various settings.

02

Parameter analysis shows sparse significant modifications cause improvements.

03

The framework effectively measures and enhances metacognitive skills in LLMs.

Abstract

Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{type2}^{'}$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Language and cultural evolution