YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering

Jennifer D'Souza; Hamed Babaei Giglou; Quentin M\"unch

arXiv:2505.14279·cs.CL·May 30, 2025

YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering

Jennifer D'Souza, Hamed Babaei Giglou, Quentin M\"unch

PDF

Open Access 2 Models

TL;DR

YESciEval is a framework that enhances the robustness and reliability of LLM-based evaluation for scientific question answering by combining rubric-based assessment with reinforcement learning, enabling scalable and transparent evaluation.

Contribution

It introduces a novel evaluation framework that reduces bias and improves the reliability of LLMs as judges in scientific QA, independent of proprietary models and human feedback.

Findings

01

YESciEval improves evaluation consistency across models.

02

The framework reduces optimism bias in LLM evaluators.

03

It supports scalable, cost-free scientific QA assessment.

Abstract

Large Language Models (LLMs) drive scientific question-answering on modern search engines, yet their evaluation robustness remains underexplored. We introduce YESciEval, an open-source framework that combines fine-grained rubric-based assessment with reinforcement learning to mitigate optimism bias in LLM evaluators. We release multidisciplinary scienceQ&A datasets, including adversarial variants, with evaluation scores from multiple LLMs. Independent of proprietary models and human feedback, our approach enables scalable, cost-free evaluation. By advancing reliable LLM-as-a-judge models, this work supports AI alignment and fosters robust, transparent evaluation essential for scientific inquiry.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Advanced Graph Neural Networks