Fine-Tuning Language Models to Know What They Know
Sangjun Park, Elliot Meyerson, Xin Qiu, Risto Miikkulainen

TL;DR
This paper introduces a framework and a training method to improve large language models' ability to recognize and report their own knowledge, enhancing their metacognitive capabilities.
Contribution
It proposes a dual-prompt measurement framework and a novel training method, ESMA, to align models' internal knowledge with their explicit responses.
Findings
ESMA improves models' metacognitive ability across various settings.
Parameter analysis shows sparse significant modifications cause improvements.
The framework effectively measures and enhances metacognitive skills in LLMs.
Abstract
Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Language and cultural evolution
