ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning
Yu Li, Rui Miao, Zhengling Qi, Tian Lan

TL;DR
ARISE introduces a hierarchical reinforcement learning framework that evolves a skill library to improve mathematical reasoning in language models, outperforming existing methods especially on out-of-distribution tasks.
Contribution
The paper presents ARISE, a novel hierarchical RL approach with a skill library that co-evolves with reasoning ability, enabling reusable strategies and improved performance on mathematical reasoning benchmarks.
Findings
ARISE outperforms GRPO-family algorithms and baselines on multiple benchmarks.
Skill library quality and reasoning performance improve together during training.
ARISE achieves significant gains on out-of-distribution tasks.
Abstract
The dominant paradigm for improving mathematical reasoning in language models relies on Reinforcement Learning with verifiable rewards. Yet existing methods treat each problem instance in isolation without leveraging the reusable strategies that emerge and accumulate during training. To this end, we introduce ARISE (Agent Reasoning via Intrinsic Skill Evolution), a hierarchical reinforcement learning framework, in which a shared policy operates both to manage skills at high-level and to generate responses at low-level (denoted as a Skills Manager and a Worker, respectively). The Manager maintains a tiered skill library through a dedicated skill generation rollout that performs structured summarization of successful solution traces (after execution), while employing a policy-driven selection mechanism to retrieve relevant skills to condition future rollouts (before execution). A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications
