Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

Dong Wang; Yang Li; Ansong Ni; Ching-Feng Yeh; Youssef Emad; Xinjie Lei; Liam Robbins; Karthik Padthe; Hu Xu; Xian Li; Asli Celikyilmaz; Ramya Raghavendra; Lifei Huang; Carole-Jean Wu; Shang-Wen Li

arXiv:2511.21686·cs.CL·April 21, 2026

Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

Dong Wang, Yang Li, Ansong Ni, Ching-Feng Yeh, Youssef Emad, Xinjie Lei, Liam Robbins, Karthik Padthe, Hu Xu, Xian Li, Asli Celikyilmaz, Ramya Raghavendra, Lifei Huang, Carole-Jean Wu, Shang-Wen Li

PDF

TL;DR

Matrix is a decentralized, peer-to-peer framework for multi-agent synthetic data generation that scales efficiently and improves throughput without sacrificing quality.

Contribution

It introduces a scalable, flexible, and decentralized multi-agent synthesis framework built on Ray, eliminating the need for a central orchestrator.

Findings

01

Matrix achieves 2-15x higher data throughput across various scenarios.

02

The framework scales to tens of thousands of concurrent workflows.

03

Matrix maintains output quality while increasing efficiency.

Abstract

Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality, more diverse, and structurally richer. However, existing frameworks for multi-agent synthesis often depend on a centralized orchestrator, creating scalability bottlenecks, or are hardcoded for specific domains, limiting flexibility. We present \textbf{Matrix}, a decentralized framework that represents both control and data flow as serialized messages passed through distributed queues. This peer-to-peer design eliminates the central orchestrator. Each task progresses independently through lightweight agents, while compute-intensive operations, such as LLM inference or containerized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.