Self-Monitoring Benefits from Structural Integration: Lessons from Metacognition in Continuous-Time Multi-Timescale Agents
Ying Xie

TL;DR
This study evaluates the effectiveness of self-monitoring modules in reinforcement learning agents, finding limited benefits when added as auxiliary components but some improvements when structurally integrated into the decision process.
Contribution
It demonstrates that structural integration of self-monitoring modules can improve agent performance in non-stationary environments, unlike auxiliary add-on approaches.
Findings
Self-monitoring modules as add-ons show no significant benefit.
Structural integration of modules yields medium-large improvements.
Modules tend to collapse to near-constant outputs, indicating limited utility.
Abstract
Self-monitoring capabilities -- metacognition, self-prediction, and subjective duration -- are often proposed as useful additions to reinforcement learning agents. But do they actually help? We investigate this question in a continuous-time multi-timescale agent operating in predator-prey survival environments of varying complexity, including a 2D partially observable variant. We first show that three self-monitoring modules, implemented as auxiliary-loss add-ons to a multi-timescale cortical hierarchy, provide no statistically significant benefit across 20 random seeds, 1D and 2D predator-prey environments with standard and non-stationary variants, and training horizons up to 50,000 steps. Diagnosing the failure, we find the modules collapse to near-constant outputs (confidence std < 0.006, attention allocation std < 0.011) and the subjective duration mechanism shifts the discount…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
