CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration
Yiyue Qian, Shinan Zhang, Yun Zhou, Haibo Ding, Diego Socolinsky, Yi Zhang

TL;DR
CollabEval introduces a multi-agent collaborative framework for LLM-based content evaluation, improving consistency and robustness over single-model methods through strategic consensus and multi-phase discussion.
Contribution
It presents a novel multi-agent collaborative evaluation framework that enhances judgment accuracy and consistency compared to existing single-LLM approaches.
Findings
Outperforms single-LLM evaluation methods across multiple metrics
Maintains robustness even when individual models face challenges
Supports diverse evaluation criteria efficiently
Abstract
Large Language Models (LLMs) have revolutionized AI-generated content evaluation, with the LLM-as-a-Judge paradigm becoming increasingly popular. However, current single-LLM evaluation approaches face significant challenges, including inconsistent judgments and inherent biases from pre-training data. To address these limitations, we propose CollabEval, a novel multi-agent evaluation framework that implements a three-phase Collaborative Evaluation process: initial evaluation, multi-round discussion, and final judgment. Unlike existing approaches that rely on competitive debate or single-model evaluation, CollabEval emphasizes collaboration among multiple agents with strategic consensus checking for efficiency. Our extensive experiments demonstrate that CollabEval consistently outperforms single-LLM approaches across multiple dimensions while maintaining robust performance even when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education
