Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization
Dong Qiu, Duo Xu, Limengxi Yue

TL;DR
This paper introduces a reinforcement learning framework for large language model agents that enhances collaboration and performance in multi-agent tasks, achieving significant speed and quality improvements.
Contribution
It proposes a novel Dec-POMDP formulation with centralized training and decentralized execution, and introduces Group Relative Policy Optimization (GRPO) for multi-agent LLM coordination.
Findings
3x faster task processing than single-agent baselines
98.7% consistency in collaborative writing
74.6% success rate in coding tasks
Abstract
Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforcement learning-augmented LLM agent framework that formulates cooperation as a decentralized partially observable Markov decision process (Dec-POMDP) and adopts centralized training with decentralized execution (CTDE). We introduce Group Relative Policy Optimization (GRPO) to jointly optimize agent policies with access to global signals during training, together with a simplified joint reward that balances task quality, speed, and coordination cost. On collaborative writing and coding benchmarks, our framework delivers a 3x increase in task processing speed over single-agent baselines, 98.7% structural/style consistency in writing, and a 74.6% test pass rate in coding. The approach consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
