Evaluating ChatGPT's Information Extraction Capabilities: An Assessment   of Performance, Explainability, Calibration, and Faithfulness

Bo Li; Gexiang Fang; Yang Yang; Quansen Wang; Wei Ye; Wen Zhao; Shikun; Zhang

arXiv:2304.11633·cs.CL·April 25, 2023·59 cites

Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness

Bo Li, Gexiang Fang, Yang Yang, Quansen Wang, Wei Ye, Wen Zhao, Shikun, Zhang

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates ChatGPT's performance, explainability, calibration, and faithfulness across seven information extraction tasks, revealing strengths in open IE and trustworthiness, but issues with poor standard IE performance and overconfidence.

Contribution

The study provides a comprehensive analysis of ChatGPT's IE capabilities, introduces new evaluation metrics, and releases annotated datasets to advance research in this area.

Findings

01

ChatGPT excels in open IE but performs poorly in standard IE.

02

It offers high-quality explanations but is overconfident in predictions.

03

Demonstrates high faithfulness to original text in most cases.

Abstract

The capability of Large Language Models (LLMs) like ChatGPT to comprehend user intent and provide reasonable responses has made them extremely popular lately. In this paper, we focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks. Specially, we present the systematically analysis by measuring ChatGPT's performance, explainability, calibration, and faithfulness, and resulting in 15 keys from either the ChatGPT or domain experts. Our findings reveal that ChatGPT's performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting, as evidenced by human evaluation. In addition, our research indicates that ChatGPT provides high-quality and trustworthy explanations for its decisions. However, there is an issue of ChatGPT being overconfident in its predictions, which resulting in low…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pkuserc/chatgpt_for_ie
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification

MethodsTest