CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

Yiyue Qian; Shinan Zhang; Yun Zhou; Haibo Ding; Diego Socolinsky; Yi Zhang

arXiv:2603.00993·cs.AI·March 3, 2026

CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

Yiyue Qian, Shinan Zhang, Yun Zhou, Haibo Ding, Diego Socolinsky, Yi Zhang

PDF

Open Access

TL;DR

CollabEval introduces a multi-agent collaborative framework for LLM-based content evaluation, improving consistency and robustness over single-model methods through strategic consensus and multi-phase discussion.

Contribution

It presents a novel multi-agent collaborative evaluation framework that enhances judgment accuracy and consistency compared to existing single-LLM approaches.

Findings

01

Outperforms single-LLM evaluation methods across multiple metrics

02

Maintains robustness even when individual models face challenges

03

Supports diverse evaluation criteria efficiently

Abstract

Large Language Models (LLMs) have revolutionized AI-generated content evaluation, with the LLM-as-a-Judge paradigm becoming increasingly popular. However, current single-LLM evaluation approaches face significant challenges, including inconsistent judgments and inherent biases from pre-training data. To address these limitations, we propose CollabEval, a novel multi-agent evaluation framework that implements a three-phase Collaborative Evaluation process: initial evaluation, multi-round discussion, and final judgment. Unlike existing approaches that rely on competitive debate or single-model evaluation, CollabEval emphasizes collaboration among multiple agents with strategic consensus checking for efficiency. Our extensive experiments demonstrate that CollabEval consistently outperforms single-LLM approaches across multiple dimensions while maintaining robust performance even when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education