Evaluating Generative Language Models in Information Extraction as   Subjective Question Correction

Yuchen Fan; Yantao Liu; Zijun Yao; Jifan Yu; Lei Hou; Juanzi Li

arXiv:2404.03532·cs.CL·April 5, 2024·1 cites

Evaluating Generative Language Models in Information Extraction as Subjective Question Correction

Yuchen Fan, Yantao Liu, Zijun Yao, Jifan Yu, Lei Hou, Juanzi Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces SQC-Score, a novel evaluation method leveraging LLMs and NLI to better assess information extraction tasks, addressing existing metric limitations and benchmark incompleteness, leading to more accurate performance evaluation.

Contribution

The paper proposes SQC-Score, an innovative evaluation approach that improves semantic matching and acknowledges omitted answers, enhancing LLM performance assessment in information extraction.

Findings

01

SQC-Score is preferred by human annotators over baseline metrics.

02

SQC-Score provides more accurate evaluation of LLMs in information extraction.

03

The method addresses both metric imprecision and benchmark incompleteness.

Abstract

Modern Large Language Models (LLMs) have showcased remarkable prowess in various tasks necessitating sophisticated cognitive behaviors. Nevertheless, a paradoxical performance discrepancy is observed, where these models underperform in seemingly elementary tasks like relation extraction and event extraction due to two issues in conventional evaluation. (1) The imprecision of existing evaluation metrics that struggle to effectively gauge semantic consistency between model outputs and ground truth, and (2) The inherent incompleteness of evaluation benchmarks, primarily due to restrictive human annotation schemas, resulting in underestimated LLM performances. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. This method innovatively utilizes LLMs, fine-tuned through subjective question correction data, to refine matching between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-keg/sqc-score
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems