DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs

Yuanhao Li; Mingshan Liu; Hongbo Wang; Yiding Zhang; Yifei Ma; Wei Tan

arXiv:2511.20468·cs.AI·November 26, 2025

DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs

Yuanhao Li, Mingshan Liu, Hongbo Wang, Yiding Zhang, Yifei Ma, Wei Tan

PDF

Open Access 1 Video

TL;DR

DRAFT-RL introduces a multi-agent reinforcement learning framework with Chain-of-Draft reasoning, enabling diverse multi-path exploration and peer-guided reflection to improve complex reasoning tasks in LLMs.

Contribution

It integrates Chain-of-Draft reasoning into multi-agent RL, allowing multiple drafts and peer evaluation for enhanced reasoning robustness and interpretability.

Findings

01

Outperforms existing reflective agents in accuracy

02

Achieves faster convergence in complex tasks

03

Enhances reasoning robustness through multi-path exploration

Abstract

Large Language Models (LLMs) have shown impressive capabilities in multi-step reasoning and problem-solving.Recent works introduce multi-agent reflection frameworks where multiple LLM agents critique and refine each other's outputs using reinforcement learning (RL). However, these approaches often rely on single-shot responses and lack structural diversity in reasoning exploration. In this paper, we propose DRAFT-RL, a novel framework that integrates Chain-of-Draft (CoD) reasoning into multi-agent RL training. Instead of generating single responses, each agent produces multiple drafts per query, which are then evaluated by peer agents and a learned reward model to identify the most promising trajectory. These selected drafts are used to refine future reasoning strategies through actor-critic learning.DRAFT-RL enables explicit multi-path exploration, peer-guided reflection, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques