SedarEval: Automated Evaluation using Self-Adaptive Rubrics

Zhiyuan Fan; Weinong Wang; Xing Wu; Debing Zhang

arXiv:2501.15595·cs.CV·January 28, 2025

SedarEval: Automated Evaluation using Self-Adaptive Rubrics

Zhiyuan Fan, Weinong Wang, Xing Wu, Debing Zhang

PDF

Open Access 1 Repo

TL;DR

SedarEval introduces a self-adaptive rubric-based evaluation paradigm for LLM outputs, creating detailed, question-specific scoring rubrics and training a specialized evaluator model that surpasses existing methods in accuracy and consistency.

Contribution

The paper presents a novel self-adaptive rubric framework and a comprehensive benchmark, SedarEval, with a trained evaluator LM that outperforms existing evaluation paradigms.

Findings

01

Evaluator LM achieves higher concordance with human grading than GPT-4.

02

SedarEval covers diverse domains including math, coding, and reasoning.

03

Self-adaptive rubrics improve evaluation precision and stability.

Abstract

The evaluation paradigm of LLM-as-judge gains popularity due to its significant reduction in human labor and time costs. This approach utilizes one or more large language models (LLMs) to assess the quality of outputs from other LLMs. However, existing methods rely on generic scoring rubrics that fail to consider the specificities of each question and its problem-solving process, compromising precision and stability in assessments. Inspired by human examination scoring processes, we propose a new evaluation paradigm based on self-adaptive rubrics. Specifically, we create detailed scoring rubrics for each question, capturing the primary and secondary criteria in a structured format of scoring and deduction points that mimic a human evaluator's analytical process. Building on this paradigm, we further develop a novel benchmark called SedarEval, which covers a range of domains including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wwn1233/sedareval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsSoftmax · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing