MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs
Yongqi Fan, Yating Wang, Guandong Wang, Jie Zhai, Jingping Liu, Qi Ye, Tong Ruan

TL;DR
MinosEval introduces a question-type-aware evaluation framework for open-ended QA that improves alignment with human judgment and interpretability by distinguishing factoid from non-factoid questions and applying tailored scoring strategies.
Contribution
It is the first to explicitly differentiate question types in open-ended QA evaluation, employing adaptive scoring methods for factoid and non-factoid questions to enhance interpretability and accuracy.
Findings
MinosEval outperforms traditional metrics in correlating with human judgments.
It provides more interpretable evaluation results through question-type differentiation.
Experiments on multiple datasets demonstrate its robustness and effectiveness.
Abstract
Open-ended question answering (QA) is a key task for evaluating the capabilities of large language models (LLMs). Compared to closed-ended QA, it demands longer answer statements, more nuanced reasoning processes, and diverse expressions, making refined and interpretable automatic evaluation both crucial and challenging. Traditional metrics like ROUGE and BERTScore struggle to capture semantic similarities due to different patterns between model responses and reference answers. Current LLM-based evaluation approaches, such as pairwise or listwise comparisons of candidate answers, lack intuitive interpretability. While pointwise scoring of each response provides some descriptions, it fails to adapt across different question contents. Most notably, existing methods overlook the distinction between factoid and non-factoid questions. To address these challenges, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
