CARL: Criticality-Aware Agentic Reinforcement Learning

Leyang Shen; Yang Zhang; Chun Kai Ling; Xiaoyan Zhao; Tat-Seng Chua

arXiv:2512.04949·cs.LG·May 12, 2026

CARL: Criticality-Aware Agentic Reinforcement Learning

Leyang Shen, Yang Zhang, Chun Kai Ling, Xiaoyan Zhao, Tat-Seng Chua

PDF

TL;DR

CARL is a reinforcement learning algorithm that uses entropy to identify critical states, focusing training on them to improve performance and efficiency in long-horizon tasks.

Contribution

The paper introduces CARL, a novel criticality-aware RL method that selectively updates actions from high-criticality states, enhancing learning efficiency and effectiveness.

Findings

01

CARL outperforms traditional methods in diverse tasks.

02

It achieves higher efficiency by focusing on critical states.

03

Experimental results confirm improved performance and resource utilization.

Abstract

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each step holds equal contribution, which deviates significantly from reality. Our analysis reveals that only the action choices on a small fraction of states are critical in determining the final outcome. Building on this insight, we propose CARL, a criticality-aware reinforcement learning algorithm tailored for long-horizon agentic reasoning. CARL leverages entropy as a heuristic proxy for state criticality and achieves focused training by assigning rewards to actions taken from high-criticality states while excluding actions taken from low-criticality states from model updates, avoiding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.