MARO: Learning Stronger Reasoning from Social Interaction
Yin Cai, Zhouhong Gu, Juntao Zhang, Ping Chen

TL;DR
This paper introduces MARO, a multi-agent social learning framework that enhances large language models' reasoning abilities through interaction, negotiation, and competition, leading to improved social and general reasoning skills.
Contribution
MARO is a novel training method that decomposes success signals, balances role training, and evaluates behaviors directly to improve LLM reasoning in social contexts.
Findings
MARO significantly improves social reasoning capabilities.
Social simulation learning enhances general reasoning skills.
Models trained with MARO transfer skills to mathematical and instruction tasks.
Abstract
Humans face countless scenarios that require reasoning and judgment in daily life. However, existing large language model training methods primarily allow models to learn from existing textual content or solve predetermined problems, lacking experience in real scenarios involving interaction, negotiation, and competition with others. To address this, this paper proposes Multi-Agent Reward Optimization (MARO), a method that enables large language models (LLMs) to acquire stronger reasoning abilities by learning and practicing in multi-agent social environments. Specifically, MARO first addresses the sparse learning signal problem by decomposing final success or failure outcomes into each specific behavior during the interaction process; second, it handles the uneven role distribution problem by balancing the training sample weights of different roles; finally, it addresses environmental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics
