LaajMeter: A Framework for LaaJ Evaluation
Samuel Ackerman, Gal Amram, Ora Nova Fandina, Eitan Farchi, Shmulik Froimovich, Raviv Gal, Wesam Ibraheem, Avi Ziv

TL;DR
LaaJMeter is a simulation framework designed to systematically evaluate and validate Large Language Model evaluators (LaaJs) in domain-specific NLP tasks, especially where data is scarce and expert annotation is costly.
Contribution
The paper introduces LaaJMeter, a novel simulation-based tool for controlled meta-evaluation of LaaJs, enabling validation of metrics and threshold estimation in low-resource domains.
Findings
Common evaluation metrics have limitations in sensitivity.
LaaJMeter helps identify effective metrics for LaaJ quality.
The framework supports scalable, domain-specific LaaJ assessment.
Abstract
Large Language Models (LLMs) are increasingly used as evaluators in natural language processing tasks, a paradigm known as LLM-as-a-Judge (LaaJ). The analysis of a LaaJ software, commonly refereed to as meta-evaluation, pose significant challenges in domain-specific contexts. In such domains, in contrast to general domains, annotated data is scarce and expert evaluation is costly. As a result, meta-evaluation is often performed using metrics that have not been validated for the specific domain in which they are applied. Therefore, it becomes difficult to determine which metrics effectively identify LaaJ quality, and further, what threshold indicates sufficient evaluator performance. In this work, we introduce LaaJMeter, a simulation-based framework for controlled meta-evaluation of LaaJs. LaaJMeter enables engineers to generate synthetic data representing virtual models and judges,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
