EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta
Raymond Bernard, Shaina Raza (PhD), Subhabrata Das (PhD), Rahul, Murugan

TL;DR
The paper introduces EQUATOR, a deterministic evaluation framework for assessing LLM reasoning and factual accuracy in open-ended questions, reducing human effort and outperforming traditional methods.
Contribution
It presents a novel deterministic evaluation framework that combines vector databases and automated scoring to improve scalability and accuracy in LLM reasoning assessment.
Findings
EQUATOR outperforms traditional multiple-choice evaluations.
Automated evaluation with LLaMA 3.2B is effective and scalable.
Framework reduces reliance on human evaluators.
Abstract
Despite the remarkable coherence of Large Language Models (LLMs), existing evaluation methods often suffer from fluency bias and rely heavily on multiple-choice formats, making it difficult to assess factual accuracy and complex reasoning effectively. LLMs thus frequently generate factually inaccurate responses, especially in complex reasoning tasks, highlighting two prominent challenges: (1) the inadequacy of existing methods to evaluate reasoning and factual accuracy effectively, and (2) the reliance on human evaluators for nuanced judgment, as illustrated by Williams and Huckle (2024)[1], who found manual grading indispensable despite automated grading advancements. To address evaluation gaps in open-ended reasoning tasks, we introduce the EQUATOR Evaluator (Evaluation of Question Answering Thoroughness in Open-ended Reasoning). This framework combines deterministic scoring with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Software Engineering Research · Semantic Web and Ontologies
MethodsFocus · LLaMA
