From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

Shaojie Wang; Liang Zhang

arXiv:2601.21909·cs.AI·January 30, 2026

From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

Shaojie Wang, Liang Zhang

PDF

Open Access

TL;DR

This paper introduces a cognitively-inspired post-training framework for large language models that separates abstract reasoning from specific problem execution, leading to improved generalization, reliability, and training efficiency.

Contribution

It proposes Chain-of-Meta-Thought and Confidence-Calibrated Reinforcement Learning to better align model training with human problem-solving cognition.

Findings

01

Achieves 2.19% and 4.63% improvements on benchmarks

02

Reduces training time by 65-70%

03

Cuts token consumption by 50%

Abstract

Current LLM post-training methods optimize complete reasoning trajectories through Supervised Fine-Tuning (SFT) followed by outcome-based Reinforcement Learning (RL). While effective, a closer examination reveals a fundamental gap: this approach does not align with how humans actually solve problems. Human cognition naturally decomposes problem-solving into two distinct stages: first acquiring abstract strategies (i.e., meta-knowledge) that generalize across problems, then adapting them to specific instances. In contrast, by treating complete trajectories as basic units, current methods are inherently problem-centric, entangling abstract strategies with problem-specific execution. To address this misalignment, we propose a cognitively-inspired framework that explicitly mirrors the two-stage human cognitive process. Specifically, Chain-of-Meta-Thought (CoMT) focuses supervised learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Topic Modeling