TL;DR
CORE introduces a reinforcement learning framework that enhances mathematical reasoning in large language models by integrating explicit concept supervision, leading to improved understanding and application of concepts.
Contribution
The paper presents CORE, a novel RL training method that uses explicit concepts as supervision signals to improve models' conceptual reasoning in math tasks.
Findings
CORE improves performance on in-domain concept exercises.
CORE enhances out-of-domain math benchmark results.
Concept-aligned quizzes and concept-injected rollouts boost reasoning accuracy.
Abstract
Large language models (LLMs) often solve challenging math exercises yet fail to apply the concept right when the problem requires genuine understanding. Popular Reinforcement Learning with Verifiable Rewards (RLVR) pipelines reinforce final answers but provide little fine-grained conceptual signal, so models improve at pattern reuse rather than conceptual applications. We introduce CORE (Concept-Oriented REinforcement), an RL training framework that turns explicit concepts into a controllable supervision signal. Starting from a high-quality, low-contamination textbook resource that links verifiable exercises to concise concept descriptions, we run a sanity probe showing LLMs can restate definitions but fail concept-linked quizzes, quantifying the conceptual reasoning gap. CORE then (i) synthesizes concept-aligned quizzes, (ii) injects brief concept snippets during rollouts to elicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
