Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Vighnesh Subramaniam; Yilun Du; Joshua B. Tenenbaum; Antonio Torralba,; Shuang Li; Igor Mordatch

arXiv:2501.05707·cs.CL·March 4, 2025·2 cites

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba,, Shuang Li, Igor Mordatch

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a multiagent finetuning approach where multiple language models independently specialize through diverse interactions, enabling sustained self-improvement and reasoning diversity beyond traditional single-agent methods.

Contribution

It proposes a novel multiagent finetuning framework that enhances model specialization and diversity, leading to improved autonomous self-improvement over multiple rounds.

Findings

01

Enables preservation of diverse reasoning chains.

02

Improves performance across reasoning tasks.

03

Outperforms single-agent self-improvement methods.

Abstract

Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. A group of language models, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result,…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. The paper is easy to follow and the content is well-organized. 2. The paper proposes a method for agent self-improvement fine-tuning based on multi-agent collaboration, allowing for multiple rounds of self-improvement fine-tuning, which could be a promising approach.

Weaknesses

My primary concerns with this paper are centered around the experimental section. (Major)The first concern is regarding the selection of experimental datasets. The paper exclusively uses mathematical language reasoning tasks, and each task is not particularly challenging. Arithmetic is limited to arithmetic operations, GSM corresponds only to Grade School level difficulty, and MATH selects only the first three levels. If the tasks are not challenging enough, it may lead to questioning the need

Reviewer 02Rating 8Confidence 3

Strengths

- Significance: This paper presents a promising approach to LLM self-improvement and could offer a valuable contribution. - Clarity: Most of the Figures in the paper are clear and the paper is generally well-written.

Weaknesses

There are several comments I would like the authors to address to make some details clearer and the paper more complete. **Major comments** 1. Role Specialization: The paper introduces distinct roles for models (generation agents and critic agents). However, it would be helpful to clarify the specific objectives each role optimizes. Additionally, I suggest emphasizing that only two roles are used in this paper (generation and critic) to avoid confusion. 2. Zero-shot Generalization: In Section

Reviewer 03Rating 6Confidence 4

Strengths

1. Jointly optimizing the LLM in the roles of generators and critics appears to be a robust method for enhancing the reasoning ability of LLMs. 2. The work shows that finetuning multiple LLMs on independent datasets derived from multi-agent debate can preserve diversity, which is a critical challenge for LLM finetuning. 3. The evaluation results show the strength of the proposed method.

Weaknesses

1. The title “Multiagent Finetuning of Language Models” may imply a broader scope than the paper addresses. Multi-agent applications of language models can indicate a much broader range of settings besides reasoning tasks and multi-agent debate, such as gaming and social simulation; however, this work focuses solely on multi-agent debate. 2. The terms “Single Agent” and “Multi Agent” is vague and unclear in this paper. For example, Sec 2.2 “Fine-tuning Single Agent”discusses scenarios involving

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies

MethodsSparse Evolutionary Training · Balanced Selection