LLMs Encode How Difficult Problems Are
William Lugoloobi, Chris Russell

TL;DR
This paper investigates whether large language models internally encode problem difficulty similarly to human judgment and how this affects their learning and performance, revealing that human-labeled difficulty correlates strongly with model behavior while automated estimates do not.
Contribution
The study demonstrates that human-annotated difficulty is reliably encoded in LLMs and can guide training to improve accuracy, unlike automated difficulty measures which become misaligned as models improve.
Findings
Human difficulty labels are strongly linearly decodable from LLMs.
Model performance improves when steering towards 'easier' problem representations.
Automated difficulty estimates degrade and misalign during model training.
Abstract
Large language models exhibit a puzzling inconsistency: they solve complex problems yet frequently fail on seemingly simpler ones. We investigate whether LLMs internally encode problem difficulty in a way that aligns with human judgment, and whether this representation tracks generalization during reinforcement learning post-training. We train linear probes across layers and token positions on 60 models, evaluating on mathematical and coding subsets of Easy2HardBench. We find that human-labeled difficulty is strongly linearly decodable (AMC: ) and exhibits clear model-size scaling, whereas LLM-derived difficulty is substantially weaker and scales poorly. Steering along the difficulty direction reveals that pushing models toward "easier" representations reduces hallucination and improves accuracy. During GRPO training on Qwen2.5-Math-1.5B, the human-difficulty probe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
