LINKAGE: Listwise Ranking among Varied-Quality References for   Non-Factoid QA Evaluation via LLMs

Sihui Yang; Keping Bi; Wanqing Cui; Jiafeng Guo; Xueqi Cheng

arXiv:2409.14744·cs.CL·October 1, 2024

LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs

Sihui Yang, Keping Bi, Wanqing Cui, Jiafeng Guo, Xueqi Cheng

PDF

Open Access

TL;DR

This paper introduces LINKAGE, a listwise ranking method using LLMs for non-factoid QA evaluation, improving correlation with human judgment by ranking multiple reference answers of varied quality.

Contribution

The paper presents a novel listwise approach leveraging LLMs for NFQA evaluation, including generating reference answer lists for questions lacking multiple references.

Findings

01

Significantly higher correlation with human annotations compared to existing metrics.

02

Outperforms pointwise and pairwise approaches on three NFQA datasets.

03

Effective in evaluating answers without gold standard references.

Abstract

Non-Factoid (NF) Question Answering (QA) is challenging to evaluate due to diverse potential answers and no objective criterion. The commonly used automatic evaluation metrics like ROUGE or BERTScore cannot accurately measure semantic similarities or answers from different perspectives. Recently, Large Language Models (LLMs) have been resorted to for NFQA evaluation due to their compelling performance on various NLP tasks. Common approaches include pointwise scoring of each candidate answer and pairwise comparisons between answers. Inspired by the evolution from pointwise to pairwise to listwise in learning-to-rank methods, we propose a novel listwise NFQA evaluation approach, that utilizes LLMs to rank candidate answers in a list of reference answers sorted by descending quality. Moreover, for NF questions that do not have multi-grade or any golden answers, we leverage LLMs to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic