Security in LLM-as-a-Judge: A Comprehensive SoK

Aiman Al Masoud; Antony Anju; Marco Arazzi; Mert Cihangiroglu; Vignesh Kumar Kembu; Serena Nicolazzo; Antonino Nocera; Vinod P.; Saraga Sakthidharan

arXiv:2603.29403·cs.CR·April 7, 2026

Security in LLM-as-a-Judge: A Comprehensive SoK

Aiman Al Masoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu, Vignesh Kumar Kembu, Serena Nicolazzo, Antonino Nocera, Vinod P., Saraga Sakthidharan

PDF

TL;DR

This paper systematically reviews security challenges in LLM-as-a-Judge systems, highlighting vulnerabilities, attack methods, defenses, and future research directions to enhance their robustness and trustworthiness.

Contribution

It provides the first comprehensive SoK on security issues in LLM-as-a-Judge, including a taxonomy, literature analysis, and identification of open challenges.

Findings

01

Significant vulnerabilities exist in LLM-based evaluation frameworks.

02

Existing defenses are limited and need improvement.

03

Open research challenges include robustness and attack detection.

Abstract

LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major academic databases, analyzing 863 works and selecting 45 relevant studies published between 2020 and 2026. Based on this study, we propose a taxonomy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.