MMAC-Copilot: Multi-modal Agent Collaboration Operating Copilot
Zirui Song, Yaohang Li, Meng Fang, Yanda Li, Zhenhao Chen, Zecheng, Shi, Yuan Huang, Xiuying Chen, Ling Chen

TL;DR
MMAC-Copilot enhances multi-modal agent collaboration by leveraging diverse agent expertise, significantly improving interaction capabilities and reducing hallucinations across various application domains, as demonstrated on benchmark tests.
Contribution
This work introduces the Multi-Modal Agent Collaboration framework (MMAC-Copilot), enabling multi-agent teamwork to improve application interaction and reduce hallucinations in large language model agents.
Findings
Achieved 6.8% performance improvement on GAIA benchmark.
Demonstrated strong capabilities on VIBench for non-API applications.
Showcased effective multi-agent collaboration across diverse domains.
Abstract
Large language model agents that interact with PC applications often face limitations due to their singular mode of interaction with real-world environments, leading to restricted versatility and frequent hallucinations. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with application. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps. We evaluate MMAC-Copilot using the GAIA benchmark and our newly introduced Visual Interaction Benchmark (VIBench). MMAC-Copilot achieved exceptional performance on GAIA, with an average improvement of 6.8\% over existing leading systems. VIBench…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation · Service-Oriented Architecture and Web Services
