CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs

Chelsea Zou; Yiheng Yao; Selena She; Robert D. Hawkins

arXiv:2605.09823·cs.MA·May 12, 2026

CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs

Chelsea Zou, Yiheng Yao, Selena She, Robert D. Hawkins

PDF

TL;DR

CalBench is a controlled environment for studying multi-agent coordination, privacy, and communication efficiency in calendar scheduling tasks with private information boundaries.

Contribution

It introduces a novel, decentralized benchmark for evaluating multi-agent coordination, privacy-preserving negotiation, and communication in calendar scheduling scenarios.

Findings

01

CalBench enables precise measurement of coordination quality and privacy leakage.

02

The environment supports comparison of decentralized protocols with an oracle baseline.

03

It facilitates studying fairness and communication efficiency in multi-agent systems.

Abstract

We introduce CalBench, a controlled evaluation environment for studying multi-agent coordination through calendar scheduling. In CalBench, N agents each manage a private calendar containing pre-existing commitments and must coordinate to schedule a stream of M incoming meetings while minimizing disruption costs. Because agents observe only their own calendars, successful scheduling requires communication across private information boundaries. Each scenario is generated with an oracle solution, enabling precise measurement of coordination quality via realized-to-optimal cost, as well as a Distributed Constraint Optimization (DCOP) baseline to provide a fair comparison under the same private-information constraints. CalBench enables precise verification of task success, communication efficiency, and fairness in the distribution of disruption costs. Our environment also studies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.