Assessing the quality of information extraction

Filip Seitl; Tom\'a\v{s} Kov\'a\v{r}\'ik; Soheyla Mirshahi; Jan; Kry\v{s}t\r{u}fek; Rastislav Dujava; Mat\'u\v{s} Ondrei\v{c}ka; Herbert; Ullrich; Petr Gronat

arXiv:2404.04068·cs.CL·May 24, 2024·1 cites

Assessing the quality of information extraction

Filip Seitl, Tom\'a\v{s} Kov\'a\v{r}\'ik, Soheyla Mirshahi, Jan, Kry\v{s}t\r{u}fek, Rastislav Dujava, Mat\'u\v{s} Ondrei\v{c}ka, Herbert, Ullrich, Petr Gronat

PDF

Open Access 4 Reviews

TL;DR

This paper presents an automatic framework for evaluating the quality and completeness of information extraction by large language models, addressing challenges like input size limitations and providing interpretative scores.

Contribution

It introduces a novel automated assessment method for information extraction quality, focusing on entity and property extraction, with performance analysis and scoring strategies.

Findings

01

Proposes an objective quality assessment framework

02

Analyzes LLM performance in information extraction tasks

03

Provides interpretative scores for extraction quality

Abstract

Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective measure for the quality of information extraction becomes imperative. However, the scarcity of labeled data presents significant challenges to this endeavor. In this paper, we introduce an automatic framework to assess the quality of the information extraction/retrieval and its completeness. The framework focuses on information extraction in the form of entity and its properties. We discuss how to handle the input/output size limitations of the large language models and analyze their performance when extracting the information. In particular, we introduce scores to evaluate the quality of the extraction and provide an extensive discussion on how to…

Peer Reviews

Decision·Submitted to NeurIPS 2024

Reviewer 01Rating 3Confidence 4

Strengths

s1. The introduction of the MINEA score is somewhat innovative. s2. The paper is clear explanations of the proposed framework.

Weaknesses

w1. Lack of Originality: The originality of the paper is insufficient. Related work has already mentioned using the "needle" method to evaluate the information extraction capabilities of LLMs. While this paper adds the use of large models to help create the needles, the contribution is still lacking. w2. Insufficient Experimental Description: The description of the experimental setup is missing, including the experimental environment, data sources, and dataset sizes. However, the paper spends t

Reviewer 02Rating 1Confidence 5

Strengths

It tries to address a relevant problem in the field (curated benchmark data is hard to come by).

Weaknesses

- The paper is from the start extremely vague and misses concrete statements and explanations about the work done. The contributions are unclear, the data is essentially undefined, for most of the work what exactly is being done is simply unclear. - Even the task of "Information Extraction" is not concretely described in a way that is reproducible. - Line 7-8: "The framework focuses on information extraction in the form of entity and its properties". - Table 1: it is completely lost upo

Reviewer 03Rating 3Confidence 4

Strengths

1. This paper analyses the technical limitations of LLMs complicating the extraction of information from a long context. 2. This paper presents to insert a needle into the data to evaluate the performance of IE without labeled data.

Weaknesses

1. The analysis of the performance of LLMs in IE is not new and has various analysis, such as in the following papers: > [1] Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness (Li et al., 2023) > [2] Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors (Han et al., 2023) > [3] When does In-context Learning Fall Short and Why? A Study on Specificatio

Reviewer 04Rating 3Confidence 4

Strengths

An interesting application of `Needle in a haystack` evaluation in information extraction.

Weaknesses

* The writing quality is not great, and several areas require further clarification * The current paper structure is confusing; not sure what role Sections 3 and 4 play in this paper, e.g., whether the authors were proposing a new LLM-based IE approach * I suggest providing a formal definition of IE studied in this paper because it is very confusing to know what information is extracted. For example, in the abstract, `entity and its properties` is mentioned; in Section 3, `short paragraphs of

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management