Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Tian Jin,, Xiaowen Dong, Yanfeng Wang, Siheng Chen

TL;DR
This paper introduces MATRIX, a multi-agent simulation framework that automatically generates diverse, realistic instruction data for large language models, significantly reducing data collection costs while improving performance.
Contribution
The paper presents a novel multi-agent simulation approach and a scenario-driven instruction generator for efficient, high-quality data synthesis for LLM post-training.
Findings
Synthesized data improves LLM performance on benchmarks.
20K synthesized pairs outperform 10M real data pairs.
Framework enables scalable, realistic data generation.
Abstract
Post-training is essential for enabling large language models (LLMs) to follow human instructions. However, its effectiveness depends on high-quality instruction data, which is challenging to obtain in the real world due to privacy concerns, data scarcity, and high annotation costs. To fill this gap, inspired by the recent success of using LLMs to simulate human society, we propose MATRIX, a multi-agent simulator that automatically generates diverse text-based scenarios, capturing a wide range of real-world human needs in a realistic and scalable manner. Leveraging these outputs, we introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis. Extensive experiments demonstrate that our framework effectively generates both general and domain-specific data. On AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained…
Peer Reviews
Decision·Submitted to ICLR 2025
The overall idea of creating a multi-agent simulator based on real-world human profiles is interesting. Grounding data synthesis in real-world human behavior can result in more realistic data. Specifically, the proposed approach goes beyond recent PersonaHub and leverages interactions between agents to create complex and diverse scenarios. Experimental results show that the synthetic datasets generated by the proposed approach are more effective than various existing real and synthetic SFT/DPO
Presentation: The main problem with this paper is lack of specifics to fully understand and replicate the approach. The papers describes the components of the proposed multi-agent simulator at a high level without providing concrete details. The proposed approach uses 1000 agents created based on real-world human profiles. Many things are unclear from the paper: -- What kind of user profiles are used in the simulator? What is the distribution? Paper does not provide details about these profiles
(1) Comprehensive Experiments: This work features extensive experiments and baseline comparisons, including evaluations against models like MagPie and WildChat on datasets such as UltraFeedback and Orca. Beyond general instruction tuning performance on benchmarks like AlphacaEval2 and Alpaca-Hard, it also compares results across domain-specific tasks like multi-turn dialogue, coding, and safety. These evaluations demonstrate MATRIX’s robust capability and data efficiency across diverse domains.
(1) The instruction tuning process of Llama-3-8B-Instruct encompasses a broader range of capabilities compared to AlpacaEval, including multilingual abilities, self-recognition of identity, and more nuanced safety settings. Consequently, it’s not fair to directly compare over 10 million instruction-tuning data points with 20,000 data points mentioned in the paper. (2) Additionally, AlpacaEval itself compares model responses using GPT-4-0314 as a reference. However, OpenAI models are also used t
1. The authors' motivation is clear. 2. The experimental section of the paper is quite detailed. 3. The methods section is thoroughly explained.
1. As far as I know, generating data through multi-agent interaction based on role-playing under predefined scenarios is widely used. Although the authors mention some distinctions from related works in the paper, I believe they need to further elaborate on the novelty of their work, particularly in comparison to other role-playing-based data generation or multi-agent interaction methods, such as AutoGen[1] . 2. The experiments in this paper are conducted only using the LLaMA-3-8B as the base mo
Code & Models
Videos
Taxonomy
TopicsSimulation Techniques and Applications · Business Process Modeling and Analysis
