RULERS: Locked Rubrics and Evidence-Anchored Scoring for Robust LLM Evaluation

Yihan Hong; Huaiyuan Yao; Bolin Shen; Wanpeng Xu; Hua Wei; Yushun Dong

arXiv:2601.08654·cs.CL·January 14, 2026

RULERS: Locked Rubrics and Evidence-Anchored Scoring for Robust LLM Evaluation

Yihan Hong, Huaiyuan Yao, Bolin Shen, Wanpeng Xu, Hua Wei, Yushun Dong

PDF

Open Access

TL;DR

RULERS introduces a framework that transforms natural language rubrics into executable specifications, enabling more reliable, stable, and scalable LLM evaluation by enforcing structured criteria, evidence verification, and calibration without model retraining.

Contribution

It presents RULERS, a novel compiler-executor system that converts rubrics into executable forms, addressing stability and alignment issues in LLM-based evaluation.

Findings

01

RULERS outperforms baselines in human agreement metrics.

02

It maintains stability against rubric perturbations.

03

Smaller models can match larger judges using RULERS.

Abstract

The LLM-as-a-Judge paradigm promises scalable rubric-based evaluation, yet aligning frozen black-box models with human standards remains a challenge due to inherent generation stochasticity. We reframe judge alignment as a criteria transfer problem and isolate three recurrent failure modes: rubric instability caused by prompt sensitivity, unverifiable reasoning that lacks auditable evidence, and scale misalignment with human grading boundaries. To address these issues, we introduce RULERS (Rubric Unification, Locking, and Evidence-anchored Robust Scoring), a compiler-executor framework that transforms natural language rubrics into executable specifications. RULERS operates by compiling criteria into versioned immutable bundles, enforcing structured decoding with deterministic evidence verification, and applying lightweight Wasserstein-based post-hoc calibration, all without updating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling