Effort as Ceiling, Not Dial: Reasoning Budget Does Not Modulate Cognitive Cost Alignment Between Humans and Large Reasoning Models
Yueqing Hu, Tianhong Wang

TL;DR
This study investigates whether the alignment of reasoning effort between humans and large reasoning models (LRMs) is influenced by inference-time effort, finding it remains invariant and is primarily determined during training.
Contribution
It demonstrates that reasoning effort alignment is a training-time feature of LRMs and not modulated by inference effort, supporting a compiled problem-solving account.
Findings
Alignment remains invariant across effort levels and tasks.
Effort parameter sets an upper generation budget, not real-time effort.
Model scale improves the match with human difficulty patterns.
Abstract
Large Reasoning Models (LRMs) generate chain-of-thought traces whose length tracks human reaction times across cognitive tasks, but recent debate questions whether this alignment reflects genuine computational structure or surface verbosity. We test whether the alignment varies with inference-time reasoning effort. Across GPT-OSS-20B and GPT-OSS-120B, three effort levels, and six reasoning tasks, within-task and cross-task alignment remain invariant: Bayes Factors lean toward the null, and mean alignment is numerically near-identical across conditions. A manipulation check reveals that the effort parameter sets an upper budget on generation rather than driving real-time allocation, suggesting that the allocation policy is crystallized at training time. Arithmetic complexity contrasts further show that token allocation tracks fine-grained, format-dependent human difficulty patterns, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
