LaajMeter: A Framework for LaaJ Evaluation

Samuel Ackerman; Gal Amram; Ora Nova Fandina; Eitan Farchi; Shmulik Froimovich; Raviv Gal; Wesam Ibraheem; Avi Ziv

arXiv:2508.10161·cs.CL·November 26, 2025

LaajMeter: A Framework for LaaJ Evaluation

Samuel Ackerman, Gal Amram, Ora Nova Fandina, Eitan Farchi, Shmulik Froimovich, Raviv Gal, Wesam Ibraheem, Avi Ziv

PDF

Open Access

TL;DR

LaaJMeter is a simulation framework designed to systematically evaluate and validate Large Language Model evaluators (LaaJs) in domain-specific NLP tasks, especially where data is scarce and expert annotation is costly.

Contribution

The paper introduces LaaJMeter, a novel simulation-based tool for controlled meta-evaluation of LaaJs, enabling validation of metrics and threshold estimation in low-resource domains.

Findings

01

Common evaluation metrics have limitations in sensitivity.

02

LaaJMeter helps identify effective metrics for LaaJ quality.

03

The framework supports scalable, domain-specific LaaJ assessment.

Abstract

Large Language Models (LLMs) are increasingly used as evaluators in natural language processing tasks, a paradigm known as LLM-as-a-Judge (LaaJ). The analysis of a LaaJ software, commonly refereed to as meta-evaluation, pose significant challenges in domain-specific contexts. In such domains, in contrast to general domains, annotated data is scarce and expert evaluation is costly. As a result, meta-evaluation is often performed using metrics that have not been validated for the specific domain in which they are applied. Therefore, it becomes difficult to determine which metrics effectively identify LaaJ quality, and further, what threshold indicates sufficient evaluator performance. In this work, we introduce LaaJMeter, a simulation-based framework for controlled meta-evaluation of LaaJs. LaaJMeter enables engineers to generate synthetic data representing virtual models and judges,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education