Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba,, Shuang Li, Igor Mordatch

TL;DR
This paper introduces a multiagent finetuning approach where multiple language models independently specialize through diverse interactions, enabling sustained self-improvement and reasoning diversity beyond traditional single-agent methods.
Contribution
It proposes a novel multiagent finetuning framework that enhances model specialization and diversity, leading to improved autonomous self-improvement over multiple rounds.
Findings
Enables preservation of diverse reasoning chains.
Improves performance across reasoning tasks.
Outperforms single-agent self-improvement methods.
Abstract
Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. A group of language models, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result,…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper is easy to follow and the content is well-organized. 2. The paper proposes a method for agent self-improvement fine-tuning based on multi-agent collaboration, allowing for multiple rounds of self-improvement fine-tuning, which could be a promising approach.
My primary concerns with this paper are centered around the experimental section. (Major)The first concern is regarding the selection of experimental datasets. The paper exclusively uses mathematical language reasoning tasks, and each task is not particularly challenging. Arithmetic is limited to arithmetic operations, GSM corresponds only to Grade School level difficulty, and MATH selects only the first three levels. If the tasks are not challenging enough, it may lead to questioning the need
- Significance: This paper presents a promising approach to LLM self-improvement and could offer a valuable contribution. - Clarity: Most of the Figures in the paper are clear and the paper is generally well-written.
There are several comments I would like the authors to address to make some details clearer and the paper more complete. **Major comments** 1. Role Specialization: The paper introduces distinct roles for models (generation agents and critic agents). However, it would be helpful to clarify the specific objectives each role optimizes. Additionally, I suggest emphasizing that only two roles are used in this paper (generation and critic) to avoid confusion. 2. Zero-shot Generalization: In Section
1. Jointly optimizing the LLM in the roles of generators and critics appears to be a robust method for enhancing the reasoning ability of LLMs. 2. The work shows that finetuning multiple LLMs on independent datasets derived from multi-agent debate can preserve diversity, which is a critical challenge for LLM finetuning. 3. The evaluation results show the strength of the proposed method.
1. The title “Multiagent Finetuning of Language Models” may imply a broader scope than the paper addresses. Multi-agent applications of language models can indicate a much broader range of settings besides reasoning tasks and multi-agent debate, such as gaming and social simulation; however, this work focuses solely on multi-agent debate. 2. The terms “Single Agent” and “Multi Agent” is vague and unclear in this paper. For example, Sec 2.2 “Fine-tuning Single Agent”discusses scenarios involving
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies
MethodsSparse Evolutionary Training · Balanced Selection
