MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani; Chandler Smith; Rocktim Jyoti Das; Rafael Rafailov; Ivan Laptev; Philip H. S. Torr; Fabio Pizzati; Ronald Clark; Christian Schroeder de Witt

arXiv:2412.01928·cs.LG·October 7, 2025·3 cites

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Rafael Rafailov, Ivan Laptev, Philip H. S. Torr, Fabio Pizzati, Ronald Clark, Christian Schroeder de Witt

PDF

Open Access

TL;DR

MALT introduces a multi-agent training pipeline for LLMs that enhances reasoning by dividing tasks into generation, verification, and refinement, leading to significant performance improvements on reasoning benchmarks.

Contribution

The paper presents a novel multi-agent post-training strategy that automatically generates training data and improves reasoning capabilities of LLMs without human supervision.

Findings

01

MALT achieves up to 15.66% improvement on MATH dataset.

02

MALT outperforms baseline LLMs on GSM8K and CSQA.

03

Multi-agent training enhances reasoning accuracy significantly.

Abstract

Large Language Models (LLMs) often produce answers with a single chain-of-thought, which restricts their ability to explore reasoning paths or self-correct flawed outputs in complex tasks. In this paper, we introduce MALT (Multi-Agent LLM Training), a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps using a sequential pipeline of heterogeneous agents. During data generation, each agent is repeatedly sampled to form a multi-agent search tree, where final outputs are graded against ground-truth data. We then apply value iteration to propagate reward signals back to each role-conditioned model, automatically producing multi-agent post-training data without human or teacher-model supervision. Our off-policy approach allows each agent to specialize by learning from correct and incorrect trajectories, ultimately improving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Multi-Agent Systems and Negotiation

MethodsLLaMA