Coevolving with the Other You: Fine-Tuning LLM with Sequential   Cooperative Multi-Agent Reinforcement Learning

Hao Ma; Tianyi Hu; Zhiqiang Pu; Boyin Liu; Xiaolin Ai; Yanyan Liang,; Min Chen

arXiv:2410.06101·cs.AI·February 25, 2025

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning

Hao Ma, Tianyi Hu, Zhiqiang Pu, Boyin Liu, Xiaolin Ai, Yanyan Liang,, Min Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces CORY, a novel multi-agent reinforcement learning framework for fine-tuning large language models, which enhances performance, robustness, and reduces distribution collapse compared to traditional PPO methods.

Contribution

CORY extends RL fine-tuning of LLMs to a cooperative multi-agent setting with role exchange, improving stability and effectiveness over existing methods.

Findings

01

CORY outperforms PPO in policy optimality.

02

CORY demonstrates increased resistance to distribution collapse.

03

CORY shows improved training robustness.

Abstract

Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs. In this paper, we propose CORY, extending the RL fine-tuning of LLMs to a sequential cooperative multi-agent reinforcement learning framework, to leverage the inherent coevolution and emergent capabilities of multi-agent systems. In CORY, the LLM to be fine-tuned is initially duplicated into two autonomous agents: a pioneer and an observer. The pioneer generates responses based on queries, while the observer generates responses using both the queries and the pioneer's responses. The two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Harry67Hu/CORY
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Reinforcement Learning in Robotics · Digital Rights Management and Security

MethodsAttention Is All You Need · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Discriminative Fine-Tuning · Linear Layer · Weight Decay · Cosine Annealing · Dropout · Byte Pair Encoding