LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

Songze Li; Chuokun Xu; Jiaying Wang; Xueluan Gong; Chen Chen; Jirui Zhang; Jun Wang; Kwok-Yan Lam; Shouling Ji

arXiv:2506.09443·cs.CR·November 18, 2025

LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces RobustJudge, a comprehensive framework for evaluating the robustness of LLM-as-a-Judge systems, revealing significant vulnerabilities, the impact of prompt design, and real-world deployment risks.

Contribution

The paper presents RobustJudge, a scalable automated framework for systematic robustness assessment of LLM-based judges, addressing existing fragmentation and exploring prompt and model effects.

Findings

01

LLM-as-a-Judge systems are highly vulnerable to adversarial attacks.

02

Defense strategies like re-tokenization improve robustness.

03

Robustness varies significantly with prompt template design.

Abstract

Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse tasks, driving the development and widespread adoption of LLM-as-a-Judge systems for automated evaluation, including red teaming and benchmarking. However, these systems are susceptible to adversarial attacks that can manipulate evaluation outcomes, raising critical concerns about their robustness and trustworthiness. Existing evaluation methods for LLM-based judges are often fragmented and lack a unified framework for comprehensive robustness assessment. Furthermore, the impact of prompt template design and model selection on judge robustness has rarely been explored, and their performance in real-world deployments remains largely unverified. To address these gaps, we introduce RobustJudge, a fully automated and scalable framework designed to systematically evaluate the robustness of LLM-as-a-Judge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

s3ic-lab/robustjudge
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Hate Speech and Cyberbullying Detection