Metacognitive Behavioral Tuning of Large Language Models for Multi-Hop Question Answering
Ik-hwan Kim, Hyeongrok Han, Mingi Jung, Sangwon Yu, Jinseok Hong, Sang Hun Kim, Yoonyoung Choi, Sungroh Yoon

TL;DR
This paper introduces Metacognitive Behavioral Tuning (MBT), a post-training framework that enhances multi-hop question answering in large language models by embedding a five-phase metacognitive reasoning structure, improving accuracy and efficiency.
Contribution
The paper proposes MBT, a novel post-training method that injects a five-phase metacognitive structure into reasoning traces, significantly improving multi-hop QA performance across multiple datasets.
Findings
MBT achieves the highest Accuracy-Efficiency Score across model scales.
MBT reduces response length and degeneration counts significantly.
MBT's structural prior leads to earlier answers, lower redundancy, and richer reasoning phases.
Abstract
Large Language Models (LLMs) often produce incorrect answers on multi-hop question answering even when the reasoning trace already contains a correct intermediate conclusion. We attribute this gap to weak self-regulation rather than insufficient reasoning capacity. Without explicit regulation, valid intermediate conclusions are overridden by continued exploration or left unrecognized as logically sufficient. We propose Metacognitive Behavioral Tuning (MBT), a post-training framework that injects a five-phase metacognitive structure into reasoning traces. The five phases are understanding and filtering, planning, execution and monitoring, self-correction, and verification. MBT has two formulations. MBT-S synthesizes new metacognitive traces from scratch, while MBT-R rewrites the student's own traces into a metacognitive form. Across HotpotQA, MuSiQue, and 2WikiMultiHopQA, MBT attains the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
