Know When to Trust the Skill: Delayed Appraisal and Epistemic Vigilance for Single-Agent LLMs
Eren Unlu

TL;DR
This paper introduces MESA-S, a framework for enhancing trust and epistemic vigilance in single-agent LLMs through delayed appraisal and confidence separation.
Contribution
It formalizes a metacognitive architecture that separates self-confidence from source-confidence and incorporates delayed evaluation to improve reliability.
Findings
Explicit trust provenance reduces vulnerabilities.
Delayed escalation prunes unnecessary reasoning.
Decoupling confidence prevents inflation and improves trustworthiness.
Abstract
As large language models (LLMs) transition into autonomous agents integrated with extensive tool ecosystems, traditional routing heuristics increasingly succumb to context pollution and "overthinking". We argue that the bottleneck is not a deficit in algorithmic capability or skill diversity, but the absence of disciplined second-order metacognitive governance. In this paper, our scientific contribution focuses on the computational translation of human cognitive control - specifically, delayed appraisal, epistemic vigilance, and region-of-proximal offloading - into a single-agent architecture. We introduce MESA-S (Metacognitive Skills for Agents, Single-agent), a preliminary framework that shifts scalar confidence estimation into a vector separating self-confidence (parametric certainty) from source-confidence (trust in retrieved external procedures). By formalizing a delayed procedural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
