SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng, Qiu

TL;DR
SpeechAgents introduces a multi-modal multi-agent system leveraging large language models to simulate human communication with rich emotions, rhythm, and scalability, enabling applications like drama and audio novel creation.
Contribution
The paper presents SpeechAgents, a novel multi-modal LLM-based multi-agent system for simulating human communication, along with Multi-Agent Tuning and a new benchmark for evaluation.
Findings
Effective simulation of human dialogue with authentic rhythm and emotions
Scalability to 25 agents in complex communication tasks
Successful application in drama creation and audio novels
Abstract
Human communication is a complex and diverse process that not only involves multiple factors such as language, commonsense, and cultural backgrounds but also requires the participation of multimodal information, such as speech. Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. Can we leverage LLM-based multi-agent systems to simulate human communication? However, current LLM-based multi-agent systems mainly rely on text as the primary medium. In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication. SpeechAgents utilizes multi-modal LLM as the control center for individual agent and employes multi-modal signals as the medium for exchanged messages among agents. Additionally, we propose Multi-Agent Tuning to enhance the multi-agent capabilities of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
