PRIME: Policy-Reinforced Iterative Multi-agent Execution for Algorithmic Reasoning in Large Language Models

Jiawei Xu; Zhenyu Yu; Ziqian Bi; Minh Duc Pham; Xiaoyi Qu; Danyang Zhang

arXiv:2602.11170·cs.CL·February 13, 2026

PRIME: Policy-Reinforced Iterative Multi-agent Execution for Algorithmic Reasoning in Large Language Models

Jiawei Xu, Zhenyu Yu, Ziqian Bi, Minh Duc Pham, Xiaoyi Qu, Danyang Zhang

PDF

Open Access

TL;DR

PRIME introduces a multi-agent framework with reinforcement learning to significantly enhance algorithmic reasoning in large language models, achieving substantial accuracy improvements across diverse complex tasks.

Contribution

The paper presents PRIME, a novel multi-agent, reinforcement learning-based framework for algorithmic reasoning, and PRIME-Bench, the largest benchmark for such tasks.

Findings

01

Accuracy improved from 26.8% to 93.8% on average.

02

Major gains in tasks requiring sustained state tracking.

03

Smaller models benefit disproportionately, matching larger models' performance.

Abstract

Large language models have demonstrated remarkable capabilities across diverse reasoning tasks, yet their performance on algorithmic reasoning remains limited. To handle this limitation, we propose PRIME (Policy-Reinforced Iterative Multi-agent Execution), a framework comprising three specialized agents, an executor for step-by-step reasoning, a verifier for constraint checking, and a coordinator for backtracking control, optimized through group relative policy optimization. For comprehensive evaluation, we introduce PRIME-Bench, the largest algorithmic reasoning benchmark to date, comprising 86 tasks across 12 categories with 51,600 instances. Tasks span sorting algorithms, graph and tree structures, automata and state machines, symbolic reasoning, and constraint-based puzzles, with execution traces reaching over one million steps. Compared to baseline approach, PRIME improves average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications