Oracle-Checker Scheme for Evaluating a Generative Large Language Model

Yueling Jenny Zeng; Li-C. Wang; Thomas Ibbetson

arXiv:2405.03170·cs.CL·May 7, 2024

Oracle-Checker Scheme for Evaluating a Generative Large Language Model

Yueling Jenny Zeng, Li-C. Wang, Thomas Ibbetson

PDF

Open Access

TL;DR

This paper introduces an oracle-checker scheme for evaluating large language models, utilizing property testing and program checking in tasks like entity extraction and paraphrase detection.

Contribution

It proposes a novel evaluation framework combining property testing and program checking for assessing LLM outputs.

Findings

01

Effective in entity extraction evaluation

02

Applicable to paraphrase decision tasks

03

Demonstrates versatility across different contexts

Abstract

This work presents a novel approach called oracle-checker scheme for evaluating the answer given by a generative large language model (LLM). Two types of checkers are presented. The first type of checker follows the idea of property testing. The second type of checker follows the idea of program checking. Their applications are demonstrated in two separate contexts, entity extraction and paraphrase decision, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques